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PREFACE 

This  report,  the  first  in  a  new  series  entitled  Small-Area  Statistics  Papers,  Series  GE-41,  contains  the  papers  presented 
at  the  Conference  on  Small-Area  Statistics  in  St.  Louis,  Mo.,  on  August  26,  1974,  during  two  sessions  of  the  annual 
meeting  of  the  American  Statistical  Association  (ASA). 

Formerly  the  papers  of  this  conference  were  published  in  Census  Tract  Papers,  Series  GE-40.  A  new  series  has  been 
created  to  indicate  that  the  papers  cover  all  types  of  small-area  statistics,  not  just  census  tract  statistics. 

The  first  session  of  the  1974  conference,  entitled  Innovations  in  Intercensal  Estimates  for  Small  Areas,  was 
sponsored  by  the  ASA  Committee  on  Small-Area  Statistics  and  organized  by  Albert  Mindlin,  past  chairman  of  the 
Committee.  Robert  C.  Klove  served  as  chairman,  and  the  speakers  were  Leo  A.  Schuerman,  W.  Nelson  Rasmussen,  Peter 
K.  Francese,  and  Gangu  K.  Ahuja. 

The  second  session,  entitled  Microdata  Bases:  Public  Data  Files  for  Socioeconomic  Research,  was  sponsored  by  the 
ASA  Business  and  Economic  Statistics  Section,  the  ASA  Social  Statistics  Section,  and  the  ASA  Committee  on 
Small-Area  Statistics.  It  was  organized  and  chaired  by  John  C.  Beresford,  the  incoming  chairman  of  the  Committee  on 
Small-Area  Statistics.  The  speakers  included  David  A.  Hirschberg,  Paul  T.  Zeisset,  Arthur  L.  Hauser,  and  Steven  M. 
Rudolph.  Discussants  were  Harold  Beebout,  Kathryn  P.  Nelson,  Trudi  Lucas,  and  Lawrence  L.  Brown,  III. 

This  report  was  organized  and  prepared  under  the  direction  of  Robert  C.  Klove,  Geographic  Research  Adviser, 
Statistical  Research  Division,  Bureau  of  the  Census. 


CONTENTS 


INNOVATIONS  IN  INTERCENSAL  ESTIMATES  FOR  SMALL  AREAS 

Introduction 1 

Albert  Mindlin 

Government  of  the  District  of  Columbia 

Combining  Ratio-Correlation  and  Composite  Methods  for  Intercensal  Social  and  Economic  Small-Area  Estimates    .  .      2 
Leo  A.  Schuerman,  E.  Wayne  Hansen,  and  Charles  Hubay 
Bureau  of  the  Census 

The  Use  of  Driver  License  Address  Change  Records  for  Estimating  Interstate  and  Intercounty  Migration     16 

W.  Nelson  Rasmussen 

California  Department  of  Finance 

Estimating  Population  for  Census  Tracts 23 

Peter  K.  Francese 

National  Planning  Data  Corporation,  Ithaca,  N.  Y. 

Expansion  of  the  Component  Method  to  Estimates  of  Population  Subgroups  in  Neighborhoods    27 

Gangu  K.  Ahuja 

Government  of  the  District  of  Columbia 

MICRODATA  BASES:   PUBLIC  DATA  FILES  FOR  SOCIOECONOMIC  RESEARCH 

Introduction 33 

John  C.  Beresford 

Data  Use  and  Access  Laboratories,  Arlington,  Va. 

Census  Data  for  General  Revenue  Sharing 34 

Steven  M.  Rudolph 
Bureau  of  the  Census 

General  Revenue  Sharing  as  Data  User  and  Data  Producer    36 

Arthur  L.  Hauser 

Office  of  Revenue  Sharing 

Comments  on  Public  Data  Files  for  Socioeconomic  Research    38 

Trudi  Lucas,  National  Science  Foundation 

Lawrence  L.  Brown,  III,  Data  Use  and  Access  Laboratories,  Arlington,  Va. 

The  Continuous  Work  History  Sample    41 

David  A.  Hirschberg 

Social  Security  Administration 

Comments  on  Problems  and  Opportunities  with  Social  Security  Sample  Data    44 

Kathryn  P.  Nelson 

Oak  Ridge  National  Laboratory 

Microdata  from  the  Current  Population  Survey    47 

Paul  T.  Zeisset  and  Larry  W.  Car ba ugh 
Bureau  of  the  Census 

Comments  on  Using  Current  Population  Survey  Data  Files 55 

Harold  Beebout 

Mathematica,  Inc.,  Washington,  D.C. 


Digitized  by  the  Internet  Archive 

in  2013 


http://archive.org/details/smallareastatist1197unit 


Innovations 
in  Intercensal 

Estimates 
for  Small  Areas 


Mindlin 


Introduction 

Albert  Mindlin 
Government  of  the  District  of  Columbia 


The  dramatic  decline  in  births  in  recent  years— occurring  at 
different  rates  in  different  places  so  that  national  change  rates  can 
no  longer  be  assumed  to  hold  for  local  areas— has  created  a  crisis  in 
intercensal  population  estimating.  Methodology  that  depends  on 
birth  rate  stability  has  become  fairly  widespread  but  is  no  longer 
considered  reliable.  At  the  same  time  the  spread  of  the  social 
indicator  movement  has  stimulated  efforts  to  obtain  for  small  areas 
intercensal  estimates  of  socioeconomic  characteristics  other  than 
the  usual  age-sex-color  population  trees.  These  developments  have 
produced  explorations  of  new  methodologies  for  small-area  demo- 
graphic estimates,  especially  regression  models;  or  expansion  of 
older  methodologies  such  as  the  "component"  methods;  or  deeper 
explorations  of  alternative  local-area  data  sources  such  as  auto 
registrations  and  other  hypothesized  "symptomatic"  variables  of 
population  changes. 

The  first  session  of  the  1974  Conference  on  Small-Area 
Statistics  of  ASA  was  devoted  to  descriptions  of  some  of  the 
efforts  to  cope  with  the  crisis  in  population  estimating  and  the 
challenge  presented  by  the  social  indicator  movement.  The  most 
elaborate  paper  was  that  given  by  Schuerman,  Hansen,  and  Hubay .  It 
describes  a  multiple  regression  model  to  estimate  various  socio- 
economic characteristics  of  residents  of  small  areas,  such  as  census 
tracts,  in  intercensal  years.  The  independent  variables  in  this  model 
are  characteristics  obtained  in  regular  operations  of  local  govern- 
ment, so  that  the  model  can  be  repeated  regularly,  such  as 
annually.  A  unique  feature  of  the  model  is  that  tracts  are  grouped 
into  strata  according  to  their  stage  of  residential  development  (nine 
categories,  ranging  from  start  of  residential  development  through 
maturation  to  decay  and  urban  renewal).  The  independent  vari- 
ables are  not  necessarily  the  same  for  each  stratum,  but  they  are 
selected  to  maximize  the  multiple  correlation  coefficient  with  the 


dependent  variable  in  the  census  year,  i.e.,  a  year  when  it  is  known. 
The  strata  equations  are  then  combined  into  a  single  data  set. 
Various  tests  of  model  stability  are  described. 

Rasmussen  reports  on  the  use  of  driver  license  address  changes 
to  estimate  both  in-  and  out-migration  separately  for  ages  18-64  by 
county.  There  are  several  advantages  to  this  file  over  other 
administrative  records.  There  are  also  pitfalls.  His  paper  delineates 
these  and  describes  tests  of  various  hypotheses  about  the  file  based 
mainly  on  comparisons  with  other  records  such  as  Federal  income 
tax  records  and  the  1970  census  data.  The  paper  also  outlines  how 
the  file  is  actually  used  to  produce  population  estimates. 

Ahuja,  explicitly  recognizing  the  breakdown  of  the  composite 
method  of  intercensal  population  estimates  because  of  the  un- 
hinging of  birth  rates,  turns  to  the  component  method  which  does 
not  depend  on  birth  rates.  He  describes  briefly  the  expansion  of 
the  component  method  to  age-sex-color  specific  population  esti- 
mates for  a  city,  a  procedure  not  heretofore  attempted. 

Francese  presents  a  brief  statement  of  the  program  of  the 
National  Planning  Data  Corporation  to  make  intercensal  household 
population  estimates  by  census  tract.  The  methodology  appears  to 
be  a  regression  model  that  mixes  1960  and  1970  census  data  with 
local  counts  of  telephone  and  auto  registrations.  The  model 
generates  a  "growth  rate"  factor  for  each  census  tract.  This  is 
applied  to  the  census  year  population  for  an  intercensal  estimate. 
The  procedure  is  restricted  to  total  household  population,  without 
age-sex-color  refinement. 

Taken  together,  these  papers  represent  a  cross-section  of  current 
efforts  to  improve  intercensal  demographic  and  socioeconomic 
characteristics  for  small  areas,  efforts  that  are  still  very  much  in 
flux  and  developing  rapidly  both  in  methodology  and  in  creation 
of  improved  input  data. 


INTERCENSAL  ESTIMATES 


Combining  Ratio-Correlation 
and  Composite  Methods  for 
Intercensal  Social  and  Economic 
Small-Area  Estimates 

Leo  A.  Schuerman,  £  Wayne  Hansen, 
and  Charles  Hubay 
Bureau  of  the  Census 

INTRODUCTION 

The  focus  of  this  paper  is  twofold.  First,  some  of  the  problems 
of  population  estimation  for  small  areas  are  briefly  reviewed.  Sec- 
ond, procedures  being  investigated  by  Census  Use  Study  to  cope 
with  problems  specific  to  small-area  estimation  are  discussed. 

In  order  to  set  the  proper  perspective  of  this  discussion,  we 
should  first  indicate  what  we  mean  by  small-area  estimates.  In  the 
context  of  this  paper,  small  areas  refer  to  areal  units  similar  to 
those  for  small-area  aggregates  of  persons  such  as  the  Census 
Bureau's  census  tract  delineations.  Furthermore,  it  should  be  noted 
here  that  the  methodology  is  still  in  a  research-and-development 
stage.  The  procedures  being  investigated  are  an  outgrowth  of  a 
need  to  obtain  different  subcategories  as  "populations  at  risk" 
which  can  be  used  as  denominators  in  index  construction  and/or 
can  be  used  as  estimates  of  socioeconomic  characteristics.1  Thus, 
the  principal  aim  is  not  to  produce  estimates  of  a  total  population 
per  se;  rather  it  is  to  estimate  components  of  a  population  in 
metropolitan  subareas.  As  an  example  of  this  different  emphasis, 
the  discussion  that  follows  will  center  around  our  current  work  on 
estimating  the  percent  of  the  "families  in  poverty"  for  areal  units 
comparable  to  census  tracts  in  Los  Angeles  County.2 

BASIC  ESTIMATION  PERSPECTIVE 

Because  of  the  need  for  estimates  of  specific  demographic 
characteristics,  the  traditional  procedures  reported  in  the  literature 
were  found  to  be  unsuitable.3  The  approach  had  to  have  a  flexible 
methodology  so  that  it  would  be  applicable  for  different  popula- 


1  Samuel  Korper,  et  al,  "Composite  Social  Indicators  for  Small 
Areas— Census  Use  Study  — Recent  Developments  in  Methodology  and  Uses," 
in  U.S.  Bureau  of  the  Census,  Census  Tract  Papers,  Series  GE-40,  No.  9, 
Social  Indicators  for  Small  Areas,  paper  presented  at  the  Conference  on 
Small-Area  Statistics,  American  Statistical  Association,  Montreal,  Canada, 
August  14,  1972;  Washington,  D.C.,  U.S.  Government  Printing  Office,  1973, 
pp.  18-23.  U.S.  Bureau  of  the  Census,  Census  Use  Study,  Social  and  Health 
Indicators  System:  Los  Angeles;  Washington,  D.C.,  U.S.  Government 
Printing  Office,  1973,  pp.  68-82. 

2  Principal  support  funds  for  poverty  estimation  were  provided  by  the 
Manpower  Administration,  Department  of  Labor,  Office  of  Policy  Coordina- 
tion and  Research. 

1 F  or    4    discussion    of    the    special    problems   see    Leo    A.    Schuerman, 

"Population    Compostion    Estimation:     A    Working    Paper    for    Local-Area 

-jtion,"    in    Urban    and    Regional    Information    Systems:    Information 

Research  for  an   Urban  Society,  proceedings  for  1973  conference,  Atlantic 

City,  N.J.;  Claremont,  Calif.,  Claremont  College  Printing  Service,  1974. 


tion  compositions.  Thus,  it  was  important  in  the  development  of 
the  estimating  procedures  that- 

1.  The  estimation  methodology  developed  would  consist  of 
a  set  of  generalized  procedural  steps;  and 

2.  It  would  be  the  selection  of  input  and  criterion  variables 
that  make  the  technique  specific  to  a  selected  subcategory  of 
the  population. 

Implicit  in  the  design  of  the  generalized  procedures  are  addi- 
tional restrictions  which  we  felt  would  enhance  the  usefulness  of 
the  methodology  for  planning  and  research  at  the  level  of  local 
governments  and  universities.  The  following  considerations  were  of 
particular  importance:  That  the  general  methodology  be  trans- 
ferable to  metropolitan  areas  other  than  the  test  sites;  that  the 
input  data  base  not  be  restricted  to  a  few  specific  common  files; 
and  that  the  model  be  stratified  according  to  particular  needs  of 
local  communities.  The  parameters  of  the  restrictions  are  summa- 
rized as  follows: 

1.  Only  routine  and  readily  available  administrative  files 
should  be  needed; 

2.  Procedures  should  be  able  to  utilize  the  kinds  of  data  files 
found  in  different  local  areas; 

3.  Model  design  should  fit  units  smaller  than  county  or  city 
level  areal  units;  and 

4.  Estimates  should  be  obtainable  on  at  least  a  yearly  basis. 

The  basic  perspective  of  the  procedures  being  developed  at 
Census  Use  Study  (CUS)  is  a  derivation  of  the  Bogue-Duncan 
Composite  Method4  expressed  through  statistical  regression  equa- 
tions. The  latter  analytical  approach,  of  course,  is  the  basic  mathe- 
matical format  of  ratio-correlation  estimates.5  Thus,  the  proce- 
dures start  with  the  assumption  that  if  a  strong  concomitant 
relationship  (as  a  practical  criteria,  if  80  percent  of  the  variance  is 
accounted  for)  can  be  established  between  a  set  of  symptomatic 
variables  and  a  known  benchmark  measure  at  one  point  in  time,  the 
symptomatic  variables  can  be  used  as  proxies  to  estimate  the 
specific  component  of  population  at  different  time  periods.6  The 
benchmark  measure,  a  known  criterion,  is  essential  because  it  pro- 
vides a  specific  reference  point  in  time  and  space.  This  is  the  vari- 
able which  is  eventually  to  be  estimated. 

Throughout  this  paper  symptomatic  variables  are  meant  to  be 
measures  which  are  derived  from  local  data  files  because  of  their 
theoretical  import  to  the  desired  estimate.  In  this  capacity  they  are 
conceptually  expected  to  have  a  consistent  and  substantial  relation- 
ship over  time.  The  initial  equation  is  usually  in  the  form  of  a 
multiple  stepwise  regression  equation  which  draws  from  a  large 


4  Donald  J.  Bogue  and  Beverly  Duncan,  "A  Composite  Method  for 
Estimating  Intercensal  Population  of  Small  Areas  by  Age,  Sex,  and  Color," 
in  Vital  Statistics— Special  Reports,  Vol.47,  August  1959. 

5  Albert  Crosetti  and  Robert  C.  Schmitt,  "A  Method  of  Estimating  the 
Intercensal  Population  of  Counties,"  in  Journal  of  the  American  Statistical 
Association,  Vol.  51,  1956,  pp.  587-590.  Donald  E.  Purcell,  "Improving 
Population  Estimates  with  the  Use  of  Dummy  Variables,"  in  Demography, 
Vol.  7,  February  1970,  pp.  87-91.  Harry  Rosenberg,  "Improving  Current 
Population  Estimates  Through  Stratification,"  in  Land  Economics,  Vol.44. 
August  1968,  pp.  331-338.  Robert  C.  Schmitt,  "An  Application  of  Multiple 
Correlation  to  Population  Forecasting,"  in  Land  Economics,  Vol.  30,  August 
1954,  pp.  227-279. 

6  A  similar  orientation  was  explored  in  reports  prepared  by  Peter  A. 
Morrison:  Demographic  Information  for  Cities:  A  Manual  for  Estimating  and 
Projecting  Local  Population  Characteristics,  R-61 8-HUD,  the  Rand  Corpora- 
tion, June  1971;  and  Small-Area  Population  Estimates  for  the  City  of  St. 
Louis,  1960-72,  with  a  Model  for  Updating  Them,  R-1373-NSF,  the  Rand 
Corporation,  September  1973. 
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pool  of  possible  symptomatic  variables.  As  such,  it  would  be  ex- 
pected that  any  one  symptomatic  variable  is  only  imperfectly 
correlated  to  a  particular  characteristic.  However,  by  using  a 
number  of  variables  in  conjunction  with  a  final  multiple  regression 
equation,  a  composite  set  can  be  tailored  to  estimate  a  specific 
benchmark  component.  Furthermore,  if  we  assume  that  the  distri- 
butions on  both  sides  of  the  equal  sign  of  the  equation  are  more  or 
less  stable,  the  final  tailored  equation  can  be  used  to  obtain  gross 
estimations  for  the  years  other  than  that  of  the  benchmark.  A 
more  detailed  discussion  of  the  procedural  advantages  of  a  regres- 
sion approach  can  be  found  in  "Population  Composition  Esti- 
mation: A  Working  Paper  of  Local  Area  Estimation"  and  Social 
and  Health  Indicators  Systems:  Los  Angeles.7 


POVERTY  ESTIMATES 

When  the  model  is  applied  to  specifications  that  consist  of  the 
more  traditional  demographic  categories  (e.g.,  age,  sex,  race-ethnic 
characteristics),  the  input  into  the  equations  are  relatively  straight- 
forward. Other  structural  compositions  of  populations,  however, 
need  operational  specifications  that  satisfy  both  conceptual  and 
measurable  requirements  of  the  estimate  model.  For  example, 
when  estimating  a  social  condition  such  as  poverty,  the  definition 
must  result  in  a  clear  and  interval  measure  that  represents  the 
estimate  for  the  benchmark  year. 

Poverty  is  specified  in  the  current  estimate  program  to  reflect 
local  urban  conditions  which  include  two  complementary  dimen- 
sions: type  of  family  income  and  subsistence  income  for  families  of 
different  sizes.  In  the  second  dimension,  an  adequate  family  stand- 
ard of  living  is  dependent  upon  the  concordance  between  the 
amount  of  family  income  and  the  number  in  the  family.  Encom- 
passed within  both  measures  is  the  situation  where  life  survival  is 
defaulted  to  a  fixed  amount  below  or  just  at  subsistence  level. 
These  two  measures  are  particularly  useful  because  they  are  obtain- 
able from  the  1970  Census  of  Population  and,  as  such,  they  can 
readily  fit  the  benchmark  measurement  criterion. 

Type  of  Family  Income 

It  is  almost  axiomatic  to  discussions  within  the  literature  con- 
cerned with  poverty  that  specific  types  of  family  income  are  uni- 
versal symptoms  of  families  forced  to  exist  at  a  subsistence  level. 
The  type  of  income  categories  usually  consists  of  certain  limited 
and/or  fixed  retirement  benefits  or  welfare  incomes  provided  by 
the  State.  Thus  the  actual  measure  of  the  first  dimension  of 
poverty  consisted  of  the  percentage  in  a  census  tract  of  the  families 
with  income  which  received  their  family  income  from  public  assis- 
tance or  welfare,  Social  Security,  and  railroad  retirement.8 


Income  by  Family  Size 

The  second  poverty  dimension  measures  the  number  of  persons 
in  a  family  that  must  live  on  a  given  income,  which  reflects  how 
well  the  particular  family  unit  can  compete  for  goods  and  services 

7  Schuerman,  op.  cit.,  and  Census  Use  Study,  op.  cit, 

"Jules  Henry,  Culture  Against  Man,  New  York,  Random  House  1963 
pp  406-441.  Patrick  M.  Horan  and  Patricia  Lee  Austin,  "The  Social  Bases  of 
Welfare  Stigma,"  in  Social  Problems,  Vol.  21,  June  1974,  pp.  648-657  S  M 
Miller,  Martin  Rein,  Pamela  Roby,  and  Bertram  M.  Gross  "Poverty' 
Inequality,  and  Conflict,"  in  The  Annals  of  the  American  Academy  of 
Political  and  Social  Science,  September  1967,  pp.  18-52. 


in  the  marketplace.9  The  final  measure  is  obtained  from  a  two 
dimensional  matrix  produced  from  a  special  census  tract  tabulation 
of  the  1 970  Census  of  Population.  The  format  of  the  tabulation  for 
the  County  of  Los  Angeles  is  shown  in  table  1 . 

To  take  into  account  standard-of-living  differentials  for  various 
parts  of  the  country,  the  minimum  level  of  income  for  different 
family  sizes  are  referenced  to  the  particular  standard  metropolitan 
statistical  areas  (SMSA's).1  °  In  this  paper,  then,  the  SMSA  is  the 
county  of  Los  Angeles.  The  income/family-size  distribution  of  the 
county  is  shown  in  table  1. 

To  obtain  a  single  criterion  of  measurement  for  each  census 
tract,  the  following  operations  were  performed: 

1.  The  lowest  income  quartile  for  each  column  in  table  I1 ' 
was  determined. 

2.  The  highest  quartile  family  income  was  identified  as  the 
"watershed"  from  which  to  determine  each  additional  unit  in 
family  size  income.  Thus,  in  table  1,  this  level  would  be  $8,999, 
shown  in  column  5. 

3.  For  each  additional  member  in  the  family  unit  category 
above  the  column  with  the  watershed,  an  added  $1,000  to  the 
watershed  value  was  charted.  This  is  shown  by  a  dotted  line  on 
table  I.12 

4.  These  standard  cut-off  thresholds  of  the  metropolitan 
area  were  applied  to  the  census  tract  level. 

5.  Finally,  to  obtain  a  single  percentage  rate  of  families 
below  the  minimum  level  of  income,  the  total  number  of  fami- 
lies below  the  cut-off  threshold  were  summed  and  divided  by 
the  total  number  of  families  in  the  census  tract. 

The  two  different  facets  of  poverty  were  regressed  against  a  set 
of  46  symptomatic  measures  shown  in  tables  2  and  3.  These  mea- 
sures were  derived  from  census  tract  aggregates  of  birth  and  death 
vital  records,  reportable  disease,  mental  health  files,  and  adult  and 
juvenile  probation  records.  The  symptomatic  variables  which 
would  provide  gross  estimates  according  to  the  procedures  de- 
scribed earlier  are  identified  by  "x"  in  the  total  columns  of  tables  2 
and  3. 


RELIABILITY  CONTROLS 

Implicit  in  the  regression  technique  summarized  above  are  the 
assumptions  that  a  cross-sectional,  distributional  relationship  is 
more  or  less  uniform  throughout  the  universe  of  territory  and  that 
the  relationship  will  remain  the  same  across  time.  Yet  we  know 
that  such  relationships  between  variable  sets  do  not  necessarily 
remain  stable,  because  of  either  fundamental  changes  in  the  pheno- 
monon  or  errors  in  the  input  data.  For  these  reasons,  we  noted 
earlier  that  initial  regression  equations  indicate  only  gross  estimates 
of  the  census  tract  characteristics.  Research  at  CUS  has  indicated 


Oscar  Ornati,  "Poverty  in  America,"  in  Louis  A.  Ferman,  et  al.  (eds  ) 
Poverty  In  America,  Ann  Arbor,  University  of  Michigan  Press,  1968    p    25 

p!o  I   °rSTSk,yc  "C0Untin8   the    Poor:    Another    Look   at    the    Poverty 
Profile,    in  Social  Security  Bulletin,  January  1965,  pp.  3-1  3. 

10  If  the  metropolitan  levels  are  lower  than  the  U.S.  average,  the  level  of 
the  United  States  should  perhaps  be  used  as  the  reference. 

"Other  minimum  levels  were  tested;  however,  due  to  the  increments  of 
the  income  breaks,  the  quartile  showed  the  most  useful  and  consistent 
results. 

12  Procedures  2  and  3  were  felt  necessary  to  adjust  for  the  conditions 
where  a  large  proportion  of  the  low-income  groups  in  any  area  generally  have 
large  families. 


INTERCENSAL  ESTIMATES 


Household 

income 
(dollars ) 


Table  1.    Household  Size  by  Household  Income:    County  of  Los  Angeles 


(Data  shown  as  numbers  of  households) 


Total 


All  incomes. 

Less  than  1,000. 
1,000  to  1,999. . 
2,000  to  2,999. . 
3,000  to  3,999.. 
4,000  to  4,999. . 

5,000  to  5,999.. 
6,000  to  6,999. . 
7,000  to  7,999.. 
8,000  to  8,999.. 
9,000  to  9,999. . 

10,000  to  10,999 

11,000  to  11,999 

12,000  to  12,999 

13,000  to  13,999 

14,000  to  14,999 

15,000  to  15,999 

16,000  to  16,999 

17,000  to  17,999 

18,000  to  18,999 

19,000  to  19,999 

20,000  to  20,999 
21,000  to  21,999 
22,000  to  22,999 
23,000  to  23,999 
24,000  to  24,999, 

25,000  to  29,999, 
30,000  to  34,999, 
35,000  to  39,999, 
40,000  to  44,999. 
45,000  to  49,999. 
50,000  or  more.. . 


1,769,331 

38,311 
39,519 
59,244 
70,710 
74,445 

82,125 
91.917 


103,494 
108,779 
109,277 

118,444 

102,356 

104,653 

86,930 

77,272 

72,275 
60,411 
50,284 
44,513 
35,709 

34,566 
24,932 
22,008 
16,892 
14,838 

49,599 
24,363 
13,766 
9,983 
6,262 
21,454 


Household  size 


persons 


670,343 

16,937 
21,871 
33,536 
40,293 
39,974 


40,520 
41,376 
43,230 
42,665 
39,916 

40,147 
33,886 
33,475 
27,410 
24,340 


21,844 
18,813 
15,194 
13,524 
10,466 

10,021 
7,109 
6,329 
4,759 
4,338 

14,230 
7,753 
4,116 
3,130 
2,003 
7,138 


3 
persons 


376,457 

8,524 

7,684 

11,662 

11,648 

14,562 


17,536 
20.390 


22,654 
24,052 
23,205 

24,996 
21,924 
22,217 
18,804 
16,623 

15,790 

13,651 

10,986 

9,769 

8,042 

7,574 
5,507 
5,037 
3,709 
3,338 

11,229 
4,837 
3,017 
2,004 
1,307 
4,179 


4 
persons 


333,413 

5,954 
4,966 
7,555 
6,890 
8,496 

10,841 
13,597 
16,253 


19,067 
21,134 

24,755 
21,673 
23,197 
19,101 
16,635 

16,248 

12,803 

11,655 

9,749 

8,282 

8,049 
5,832 
4,925 
4,089 
3,214 

11,207 
5,596 
3,350 
2,192 
1,363 
4,745 


5 
persons 


198,307 

3,318 
2,567 
3,358 
4,790 
5,040 

5,983 
7,600 
9,950 

*11,127 


12,532 


14,926 
12,755 
13,510 
11,398 
10,443 

9,697 
8,227 
6,575 
6,056 
4,762 


4,983 
3,485 
3,095 
2,187 
2,269 

7,021 
3,545 
1,802 
1,468 
849 
2,989 


persons 


103 


320 

1,705 
1,229 
1,580 
3,750 
2,625 


3,455 
4,273 
5,689 


6,117 


7,383 
6,559 
6,818 
5,776 
5,368 

4,881 
4,002 
3,448 
3,227 
2,447 

2,294 
1,689 
1,519 
1,194 
1,049 

3,460 

1,597 

939 

759 

492 

1,453 


7 
persons 


54,823 

1,261 

858 

990 

2,336 

1,979 

2,079 
2,744 
3,552 


3,512 
3,718 


3,591 
3,385 
2,865 
2,453 

2,421 
1,928 
1,454 
1,479 
1,055 

1,020 
820 
670 
638 
438 

1,536 
702 
378 
272 
195 
601 


16,791 

299 

171 
324 
601 
846 

752 

911 

1,127 


247 
223 


1,265 
946 


9 
persons 


1,119 


8,099 

141 
98 
137 
211 
521 

415 
472 
568 


510 
532 

601 
507 
487 


10  or 

more 

persons 


863 

360 

752 

377 

686 

353 

485 

260 

488 

233 

369 

182 

338 

141 

298 

158 

268 

103 

182 

148 

136 

97 

98 

51 

442 

240 

161 

64 

108 

35 

61 

33 

45 

0 

180 

64 

7,778 

172 
75 
102 
191 
402 

544 

554 

471      Lowest   quart ile 

482      rounded   to 

474      nearest   $1,000 


478 
515 
445 
353 
"2  8*1 


355 
242 

251 
158 
176 

169 

119 

103 

83 

43 

234 
108 

21 

64 

8 

105 


$1,000    increment 
for  each  addi- 
tional  person 
above   watershed. 


*  Watershed   value. 


that  further  refinement  of  the  basic  equation  is  required  to  cope 
with  estimate  reliability  over  time.  Also  statistical  methodology  has 
to  be  applied  to  the  input  and  output  variables  to  sensitize  the  user 
to  actual  distributional  changes  over  time. 


Interyear  Reliability 

Fundamental  to  reliable  estimates  over  time  is  the  need  for 
stable  symptomatic  variables.  This  can  be  a  special  problem  when 
the  variables  are  derived  from  local  data  files.  As  noted  above, 
these  measures  can  be  affected  by  basic  changes  in  social  patterns 
as  well  as  by  technical  errors  at  the  data-gathering  stage.  To  help 
detect  such  deviations,  two  sets  of  procedures  have  been  explored 
at  Census  Use  Study. 

First,  each  data  item  included  in  the  construction  of  the 
symptomatic  variables  is  examined  for  a  minimum  number  of 
occurrences  per  subarca  (e.g.,  25  cases).  Then  each  item  is  tested  for 
substantial  differences  between  years  for  each  areal  subunit.  Thus, 
if  there  were  important  differences  of  a  certain  item  for  a  specific 
census  tract  between  1970  and  1971,  for  example,  the  variable  and 
census  tract  are  flagged  lor  further  examination.  If  the  same  census 


tract  indicated  major  changes  for  many  items,  however,  greater 
confidence  can  be  placed  in  the  abnormal  changes  between  the 
specific  years.  If,  on  the  other  hand,  only  one  or  two  items  are 
indicative  of  major  changes  (and  especially  if  the  change  occurs  for 
many  subareas),  the  questioned  item  is  further  examined  on  an 
individual  basis. 

Differences  between  years  for  each  item  for  each  tract  can  be 
treated  as  a  special  case  of  differences  of  proportions  for  a  single 
sample.  Thus: 


Z-   = 


t2 


tl 


vr 


tl 


ltl 


N 


where 

N  = 

nt1  = 

nt2  = 

Pt1  = 

Pt2  = 

qt1  = 


ntl  +nt2 

number  of  specific  data  items  in  time  1 

number  of  specific  data  items  in  time  2 
nn/N 

nl2/N 
1-Ptl 
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Table  2.    Symptomatic  Variables  Used  To  Estimate  Poverty  by  Type  of  Income:    County  of  Los  Angeles 


Symptomatic  variables 

Total 

Maturation  index 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Percent  births  in  public  hospitals 

X 

X 

X 

X 
X 

X 
X 

X 

X 

X 
X 

X 

X 

X 

X 
X 

X 

X 

X 
X 

X 

X 
X 

X 
X 

X 
X 
X 

X 

X 
X 

X 
X 

X 

X 

X 
X 

X 

X 
X 

X 

X 
X 

X 
X 

X 

X 

X 

X 

X 

X 

X 
X 

X 

X 
X 

X 
X 

X 
X 

X 
X 

X 

X 

X 

Percent  fathers  under  17  years 

Percent  second  births  to  semi-skilled  fathers 

Percent  births  with  little  prenatal  care „..<•... 

X 

Percent  Anglo  mothers  under  17  years 

Percent  black  mothers  under  17  years 

Percent  other  nonwhite  mothers  under  17  years 

x 

Percent  Spanish -American  fathers  under  17  years. .... . . . „.„. 

X 

Percent  other  nonwhite  parents  under  17  years . .. 

Percent  Spanish-American  parents  under  17  years 

x 

Areal  density  of  parents  under  17  years 

Areal  density  of  births 

Percent  deaths — semi-skilled  occupations 

Percent  deaths  65  years  and  older 

Percent  youth  deaths 

Areal  density  of  deaths  65  years  and  older 

Areal  density  of  youth  deaths 

X 

Percent  diseases  reported  by  public  facility..... 

Percent  childhood  diseases  

Areal  density  of  disease  reported  by  public  facility 

Areal  density  of  critical  diseases 

Areal  density  of  childhood  diseases 

Percent  Juvenile  delinquents  using  public  defender 

x 

Percent  juveniles  detained 

Percent  female  juvenile  delinquency 

Reverse  sex  ratio  (females/male)  — juveniles 

Areal  density  of  juvenile  delinquency 

X 

Percent  juveniles — police  referral 

Percent  juveniles  with  families  in  poverty 

Reverse  sex  ratio — adult  probationers 

Percent  females  on  probation — adult 

Percent  inpatients  on  mental  health  file 

In  its  application  here,  the  i  score  is  primarily  used  as  a  descrip- 
tive sensitizing  score  and  only  incidentally  as  a  test  of  signifi- 
cance—its normal  application. 

Subsequent  to  distillation  of  the  data  items,  each  symptomatic 
variable  was  subjected  to  an  analysis  of  extreme  trends.  Here, 
again,  each  variable  is  examined  tract  by  tract  for  between-year, 
abnormal  changes.  To  do  this,  interannual  regression  and  correla- 
tions were  calculated  to  compare  the  relative  change  between  time 
1  and  time  2.  For  example,  when  regressing  1968  on  1969  for 
juvenile  delinquency  reverse  sex  ratio  (i.e.,  new  female  juvenile 
delinquent  cases/new  male  juvenile  delinquent  cases),  the  correla- 
tion was  shown  to  be  only  0.21  5.  As  shown  in  tables  2  and  3,  this 
variable  was  not  selected  as  input  into  any  of  the  final  equations. 
Later  on  in  the  paper,  the  same  form  of  analysis  will  be  applied  to 
the  final  poverty  estimate  measures. 


An  Urban  Maturation  Index 

Many  metropolitan  areal  studies  have  shown  that  population 
distributions  and  characteristics  are  influenced  by  the  type  and 
length  of  time  subareas  have  been  urbanized.13  For  example, 
Hoover  and  Vernon  outlined  the  process  of  urban  development 
from  subareas  of  undeveloped  land  to  subareas  with  intensive 
urban  use.  As  a  comparison  to  a  pond  of  water,  they  drew  an 


13 Edgar  M.  Hoover  and  Raymond  Vernon,  Anatomy  of  a  Metropolis, 
Garden  City,  N.Y.,  Doubleday,  1962.  Leo  F.  Schnore,  "Municipal  Annexa- 
tions and  the  Growth  of  Metropolitan  Suburbs,  1950-1960,"  in  American 
Journal  of  Sociology,  Vol.  67,  January  1962,  pp.  406-417.  Donald  L.  Bogue, 
The  Population  of  the  United  States,  Glencoe,  N.Y.,  Free  Press  of  Glencoe, 
1959. 


INTERCENSAL  ESTIMATES 


Table  3.    Symptomatic  Variables  Used  To  Estimate  Poverty  by  Income  by  Family  Size:    County  of  Los  Angeles 


Symptomatic  variables 

Total 

Maturation  index 

1 

2 

3 

4 

5 

6 

7 

8 

9 

X 
X 
X 

X 
X 

X 

X 

X 

X 
X 

X 

X 
X 

X 
X 

X 

X 

X 

X 

X 
X 

X 

X 
X 

X 
X 

X 

X 

X 
X 

X 

X 

X 

X 

X 
X 

X 
X 

X 
X 

X 

X 

X 
X 

X 
X 

X 

X 

X 
X 
X 

Percent  second  and  subsequent  births  to  semi-skilled  fathers.... 

X 

Percent  Spanish-American  parents  under  17  years ..„„ 

X 

X 

Percent  childhood  diseases 

Areal  density  of  disease  reported  by  public  facility 

Percent  juveniles  with  families  in  poverty 

Areal  density  of  mental  health  patients  65  years  and  older 

analogy  which  suggested  that  regional  growth  is  like  a  widening 
ripple  but,  "not  from  a  single  pebble. ..dropped  into  a  puddle,  but 
from  a  scattered  handful  of  large,  middling,  and  small  pebbles,  each 
a  focus  of  expansion."14  Thus  urban  development  is  not  neces- 
sarily consistent  in  time,  nor  does  it  occur  by  contiguous  neighbor- 
hoods. Even  so,  what  seems  to  be  erratic  growth  takes  a  typical 
land  use  cycle  along  the  following  lines: 

1 .  Start  of  residential  development 

2.  Beginning  and  increase  in  multiple  dwelling  units 

3.  Conversion  and  downgrading  of  original  structures 


14  Hoover  and  Vernon,  op.  clt.,  p.  184. 


4.  Thinning  out  through  vacancy,  abandonment,  and  demo- 
demolition 

5.  Renewal  15 

In  conjunction  with  land  use  patterns,  they  also  observed  that 
subarea  population  characteristics  are  generally  interwoven  into  the 
cycle  of  time  and  the  changing  patterns  forced  by  outward  growth 
of  numerous  urban  subareas  within  metropolitan  areas.16  If  the 
maturation  process  of  subareas  is  associated  with  the  type  of 
development  and  length  of  time  subareas  have  been  urbanized,  the 


l'lbid.,  pp.  183-198. 

''■Ibid.,  pp.  166-170,188-191,236. 
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accountability  of  such  a  process  should  improve  the  reliability  of 
estimating  population  characteristics,  whether  by  adding  infor- 
mation to  the  estimating  equation  or  by  reducing  a  source  of  syste- 
matic variance. 

To  measure  urban  maturation,  a  derivation  of  an  index  first 
suggested  by  Duncan  and  others17  was  developed.  A  qualitative 
index,  it  is  totally  generalized  to  all  metropolitan  areas  because  it  is 
derived  from  census  tract  information  obtained  from  the  1970 
Census  of  Housing.  The  urban  maturation  index  contains  nine  cate- 
gories which  identify  different  stages  of  metropolitan  development. 
The  categories  isolate  patterns  ranging  from  total  rural  to  very 
mature  urban.  To  obtain  the  index,  subareas  were  cross-classified 
by  type  of  dwelling  unit  and  classified  as  to  stage  of  development 
according  to  the  Hoover  and  Vernon  land  use  cycle.  Each  stage  was 
classified  as  completed  when  a  density  criterion  of  two  single  apd/ 
or  two  multiple  (i.e.,  apartments)  dwelling  units  per  gross  acre  was 
achieved  before  1940,  between  1940  and  1970,  or  not  built  up  as 
of  1970.  The  actual  configuration  of  the  nine  categories  for  Los 
Angeles  County  is  shown  in  table  4.  Figure  1  presents  a  spatial 
distribution  of  the  nine  levels  of  maturation.  The  oldest  areas  are 
represented  by  the  darkest  units  and  generally  reflect  the  core  of 
the  City  of  Los  Angeles,  Long  Beach,  Santa  Monica- Venice,  Pasa- 
dena, and  Hollywood. 

Table  4.    Urban  Maturation  Index:    County  of  Los  Angeles 


Date  of  settlement 
(single  dwelling  units) 

Date  of  maturation 
(multiple  dwelling  units) 

Before  1940 

1940-1970 

After  1970 

Before  1940 

2 

(N=65) 

3 
(N=41) 

1 
(N=58) 

4 
(N=86) 

5 
(N=139) 

6 
(N=39) 

7 

1940-1970 

(N=60) 
8 

After  1970 

(N=410) 
9 

(N=244) 

1-Multiple  DU  (dwelling  unit)  density  achieved  before 
single  DU  density  not  yet  achieved  in  1970. 

2-Multiple  DU  density  achieved  before  1940,  single  DU 
achieved  before  1940. 

3-Multiple  DU  density  achieved  before  1940,  single  DU 
achieved  between  1940  and  1970. 

4-Multiple  DU  density  achieved  between  1940  and  1970, 
DU  density  achieved  before  1940. 

5-Multiple  DU  density  achieved  between  1940  and  1970, 
DU  density  achieved  between  1940  and  1970. 

6-Multiple  DU  density  achieved  between  1940  and  1970, 
DU  density  not  yet  achieved  in  1970. 

7-Multiple  DU  density  not  yet  achieved  in  1970,  single 
density  achieved  before  1940. 

8-Multiple  DU  density  not  yet  achieved  in  1970,  single 
density  achieved  between  1940  and  1970. 

9-Multlple  DU  density  not  yet  achieved  in  1970,  single 
units  density  not  yet  achieved  in  1970. 


1940, 

density 

density 

single 

single 

single 

DU 

DU 

DU 


A  particularly  useful  aspect  of  the  maturation  index  is  that  it 
measures  past  performances  of  the  urban  growth  process.  There- 
fore, the  index  should  be  more  or  less  stable,  and  consequently,  it 
can  be  applied  to  estimate  measures  over  time.  The  exception 
might  occur  in  those  areas  that  move  into  the  urban  maturation 
process  after  1 970. 

17  Beverly  Duncan,  Georges  Sabagh,  and  Maurice  D.  Van  Arsdol,  Jr., 
"Patterns  of  City  Growth,"  in  American  Journal  of  Sociology,  Vol.  67, 
January  1962,  pp.  419-421.  Maurice  D.  Van  Arsdol,  Jr.,  and  Leo  A. 
Schuerman,  "Redistribution  and  Assimulation  of  Ethnic  Populations:  The 
Los  Angeles  Case,"  in  Demography,  Vol.  8,  November  1971 ,  pp.  459480. 


REFINING  THE  POVERTY  ESTIMATES 

The  actual  usefulness  of  the  maturation  index  described  above  is 
its  qualitative  reclassification  capability.  As  such,  the  index  may 
have  either  an  additive  or  a  multiplicative  effect  on  the  gross 
poverty  estimates.  To  determine  which  effect,  if  any,  the  index 
has,  the  original  multiple  regression  equation  configuration,  de- 
fined in  the  total  columns  of  tables  2  and  3  for  Los  Angeles 
County  (and  the  results  displayed  in  the  total  column  of  table  5) 
can  be  reexamined  through  an  analysis  of  multiple  covariance. 
Using  the  nine-class  maturation  index  as  the  qualitative  variable,  a 
multiplicative  effect  (i.e.,  an  interaction  between  the  qualitative 
and  the  interval  measures)  can  be  determined  through  a  test  for 
equality  of  slopes. 

If  there  is  a  significant  difference  between  the  regression  planes 
of  the  different  maturation  areas,  it  indicates  that  the  nine  areas 
should  be  treated  separately.  Thus,  separate  multiple  regressions 
would  be  calculated  for  each  different  set  of  subareas.  On  the  other 
hand,  if  the  planes  were  found  to  have  statistically  similar  slopes, 
but  the  intercepts  of  the  hyperplane  had  substantially  different 
values,  it  would  suggest  including  the  nominal  categories  as  addi- 
tional information  that  should  reduce  the  systematic  error  variance 
around  the  regression  plane. 

However,  the  maturation  index  is  qualitative  in  nature;  there- 
fore, it  has  to  be  treated  by  a  descriptive  summary  measure  differ- 
ent from  the  standard  multiple  regression  equation.  A  statistical 
procedure  to  handle  a  mixture  of  independent  interval  variables 
and  qualitative  indexes  is  defined  here  as  a  multiple  ETA18  and  is 
expressed  thus: 


Y    =  a 


+  b      x.,    +  b      x„  + 
ly   1  2y    2 


b.  .,  x.  .    +  f  (x. ,,         ,  ) 
ith   ith        v   ith   +  1 


where 
f      (x 


ith  +  1 


Z'" 


'ith 


cith 


)  =  mean  average  of  residuals  (Z"')  according  to  a 
specific  nominal  factor. 
=  actual  dependent  variable  minus  expected  de- 
pendent variable  (e.g.,  poverty  measure)  along 
the  regression  plane  (i.e.,  Y-Y')  for  the  bench- 
mark year. 
=  regression  coefficient  obtained  by  regressing  the 
symptomatic  variables  to  the  measure  of  poverty 
for  the  benchmark  year. 
=  specific  symptomatic  variable  for  the  year  to  be 
estimated. 

If  the  multiple  covariance  tests  indicate  that  ETA  is  the  appropri- 
ate procedure  for  obtaining  an  equation  of  proxy  measures,  this 
direction  of  the  process  would  then  also  explain  at  least  part  of  the 
nondetermined  curve  patterns  reflecting  metropolitan  growth  and 
change  and  its  influence  on  poverty  characteristics. 

In  the  present  example  of  Los  Angeles  County,  however,  there 
was  substantial  interaction  for  both  dimensions  of  poverty.  The 
difference  between  regression  slopes  of  the  different  maturation 
categories  exceeded  the  .000001  level  of  confidence.  Therefore, 
the  analysis  did  not  proceed  through  the  multiple  ETA  equations. 
Instead,  the  regression  procedures  described  earlier  were  rerun  for 
each  category  of  the  maturation  index.  Summary  statistics  of  the 
benchmark  figures  for  the  two  poverty  dimensions  for  each  of  the 
index  categories  are  shown  in  table  6.  This  table  shows  no  discern- 
able  pattern  for  the  different  categories. 


1 8  For  a  more  detailed  discussion  of  ETA  see  Mordecai  Ezekiel  and  Karl 
A.  Fox,  Methods  of  Correlation  and  Regression  Analysis,  3rd  ed.,  New  York, 
John  Wiley  &  Sons,  1 959,  pp.  378-387. 
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Table  5.   Multiple  Correlation  Coefficients  and  Standard  Errors  of  Estimates  for  Selected  Symptomatic  Variables  on  Two  Dimensions  of  Poverty 

for  Benchmark  Year  1970:   County  of  Los  Angeles 


Measure 

Total 

Maturation  index 

1 

2 

3 

4 

5 

6 

7 

8 

9 

1,142 

.850 

.849 

4.247 

4.262 

.908 

.907 

7.332 

7.354 

58 

.760 

.727 

4.483 

4.781 

.825 

.806 

9.236 

9.755 

65 

.910 

.898 

2.745 

2.931 

.945 

.941 

5.077 

5.284 

41 

.716 

.666 

2.392 

2.589 

.970 

.964 

3.761 

4.192 

86 

.933 

.926 

2.881 

3.045 

.956 

.954 

5.355 

5.518 

139 

.937 

.933 

2.187 

2.261 

.940 

.938 

4.698 

4.803 

39 

.971 

.963 

2.178 

2.483 

.951 

.944 

6.066 

6.594 

60 

.949 

.940 

2.355 

2.579 

.943 

.933 

5.553 

6.023 

410 

.881 

.880 

3.324 

3.349 

.930 

.929 

5.523 

5.557 

TYPE  OF  INCOME 

244 

Corrected  multiple  correlation  coefficient*... 

.798 
.788 

4.451 

INCOME  BY  FAMILY  SIZE 

4.555 

Corrected  multiple  correlation  coefficient*... 

.882 
.878 

Corrected  standard  error  of  estimate* 

8.012 

8.123 

"Corrected  for  size  of  sample. 


Table  6.   Basic  Statistics  for  Selected  Symptomatic  Variables  on  Two  Dimensions  of  Poverty  for  Benchmark  Year  1970:   County  of  Los  Angeles 


Maturation  index 


Number  of  subareas , 

TYPE  OF  INCOME 

Mean  average 

Standard  deviation 

Minimum  score 

Maximum  score 

Skewness  value 

Kurtosis  value 

INCOME  BY  FAMILY  SIZE 

Mean  average 

Standard  deviation 

Minimum  score 

Maximum  score 

Skewness  value 

Kurtosis  value 


1,142 


17.3 
8.1 
0.0 

65.5 
1.2 
2.3 


29.6 

17.5 

0.0 

96.2 

0.9 

0.2 


22.8 
6.9 
9.9 

51.5 
1.6 
4.4 


40.3 
16.3 
15.1 
96.1 
0.7 
0.6 


23.0 
6.6 
3.6 

38.1 

0.0 
0.4 


40.2 
15.5 
5.7 
72.1 
-0.1 
-0.8 


21.2 
3.4 
14.1 
28.2 
-0.1 
-0.6 


35.9 
15.5 
12.8 
59.5 

0.1 
-1.4 


24.2 
8.0 
9.0 

45.4 
0.8 

-0.1 


42.1 
18.3 

8.0 
78.8 

0.0 
-1.2 


16.6 
6.3 
8.3 

55.7 
3.0 

15.0 


27.2 

13.8 

7.6 

87.5 

1.5 

3.3 


18.0 
9.1 
3.7 

60.5 
2.7 
9.9 


30.4 

19.7 

5.4 

94.5 

1.5 

2.0 


24.2 

7.5 

10.0 

45.4 

0.9 

0.7 


43.6 
16.7 
11.8 
90.4 
0.3 
-0.1 


25.0 

15.0 

3.1 

76.9 

1.4 

1.5 


13.8 
7.4 
0.0 

46.5 
1.4 
2.6 


24.3 

17.0 

0.0 

89.7 

1.4 

1.6 


It  should  be  noted  that  the  same  symptomatic  variables  first 
determined  for  the  total  universe  of  territory  were  not  necessarily 
the  ones  selected  for  the  disparate  index  categories.  As  might  be 
expected  from  the  interaction  results,  which  indicate  the  unique- 
ness of  the  different  maturation  categories,  certain  symptomatic 
variables  should  indicate  association  in  some  subareas  and  not  in 
others.  This  expectation  was  reflected  in  the  results.  Symptomatic 
variable  combinations  which  were  eventually  selected  and  regressed 
for  each  of  the  nine  categories  of  urban  maturation  are  marked 
with  "x"  in  tables  2  and  3.  The  specific  number  of  variables 
finally  included  had  to  have  regression  coefficients  that  were  at 
least  significant  at  the  .1 0  level  of  confidence.  The  multiple  correla- 
tion coefficients  and  standard  error  of  the  estimates  for  each  mat- 
uration category  are  presented  in  table  5.  In  general,  separate  equa- 
tions calculated  for  each  maturation  area  show  significant  improve- 
ment over  a  single  regression  equation  for  the  entire  area.  These 
improvements  in  turn  reflect  a  substantial  reduction  in  systematic 
error  variance  that  otherwise  would  be  expected  in  the  estimate 
equation. 

Segmentation  by  maturation  categories  also  helps  identify 
groups  of  subareas  where  problems  of  estimation  are  likely  to 
occur.  These  subareas  would  not  be  as  readily  apparent  in  a  single 
regression  analysis  for  the  entire  universe  of  territory.  For  example, 
the  summary  statistics  in  table  5  for  index  categories  1  and  9  are 


significantly  less  stable  than  the  other  categories  for  both  dimen- 
sions of  poverty,  and  category  3  is  less  stable  for  type  of  family 
income.  Thus,  poverty  estimates  calculated  for  these  more 
variance-prone  areas  can  be  specifically  identifed  and  checked  by 
other  sources  for  reliability. 


POVERTY  ESTIMATE  RESULTS:  1968-1971 

The  analytical  steps  above  determined  the  final  regression  equa- 
tions that  best  fit  between  the  symptomatic  variables  and  the 
two  poverty  measures.  Since  the  multiple  covariance  test  for  inter- 
action indicated  a  control  for  each  maturation  category,  a  total  of 
18  regression  equations  were  obtained.  These  equations  were  sub- 
sequently applied  to  their  respective  symptomatic  variables  for 
1968,  1969,  1970,  and  1971.  For  each  of  these  separate  years  the 
results  of  the  nine  different  equations  were  combined  into  one  data 
set  for  each  dimension  of  poverty. 

Earlier  in  this  paper,  it  was  indicated  that  there  should  be  a 
single  measure  of  poverty.  For  the  present  study  this  measure  was 
derived  for  each  year  by  averaging  for  each  census  tract  the  differ- 
ent percentages  that  represented  the  two  dimensions  of  poverty. 
The  use  of  averages  of  different  approaches  to  obtain  a  single  mea- 
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sure  has  been  shown  to  be  a  useful  way  of  obtaining  more  reliable 
estimate  values.1  9 

Because  an  actual  presentation  of  all  the  estimates  for  each  tract 
for  each  year  is  beyond  the  scope  of  this  paper,  a  series  of  com- 
puter maps  is  presented.  The  estimates  for  each  of  the  4  years  are 
shown  in  figures  2,  3,  4,  and  5.  Each  map  presents  data  in  10  equal 
class  intervals.  Thus,  the  darkest  shading  represents  those  subareas 
in  the  highest  poverty  decile  for  the  respective  year. 


CONCLUSION 


Testing  the  accuracy  of  estimating  a  component  of  the  popula- 
tion such  as  poverty  is  particularly  problematic.  In  the  case  of  total 
population  estimation,  reliability  is  usually  tested  against  an  out- 
side criterion.  A  special  census  which  has  the  same  time  and  space 
parameters  as  the  estimate  would  be  an  example.  In  the  case  of 
poverty,  however,  neither  the  time  nor  the  spatial  parameters  are 
usually  available  from  an  outside  source.  This  is  partially  due  to  the 
special  and  limited  interest  of  the  component  and  partially  a  prob- 
lem of  estimating  what  is  essentially  a  conceptual  variable.  General- 
ly, estimates  are  sought  for  more  traditional  components  of  the 
population,  such  as  total  counts,  sex,  and  age,  where  the  concept  is 
more  or  less  given.  Since  no  outside  concurrent  component  of 
poverty  was  available,  alternative  tests  were  used  to  partially  cope 
with  the  problem  of  reliability. 

In  general,  it  was  assumed  that,  while  poverty  should  increase  or 
decrease  in  subareas  over  time,  the  changes  should  not  be  expected 
to  be  abrupt  from  year  to  year.  Therefore,  by  examining  the 
changes  from  year  to  year,  one  can  be  sensitized  to  the  possibility 
of  "unreasonable"  variance  in  the  estimates.  Table  7  presents  the 
summary  statistics  for  the  combined  poverty  measures  for  each  of 
the  4  years.  A  visual  inspection  shows  little  change  in  the  means 
and  the  standard  deviations  for  the  measures  between  each  year. 
The  absolute  difference  was  also  confirmed  by  critical  ratio  tests 
between  the  years.  None  of  the  -Z  scores  for  the  difference  between 
means  were  significant  at  the  .01  level.  This  lack  of  significance 
includes  a  test  between  1968  and  1971  mean  levels  of  poverty. 
Only  the  latter  test  was  significant  at  the  .05  level,  which  should  be 
expected  since  the  difference  is  between  four  points  in  time.  These 
null  results  are  especially  interesting  when  viewed  within  the  con- 
text that  the  tests  are  particularly  sensitive  to  the  size  of  the  N.20 
Thus,  with  the  number  of  subareas  being  1,142,  it  is  much  easier  to 
obtain  statistical  significance  than  with  an  N  of,  say,  142. 


Table  7.    Basic  Statistics  and  Critical  Ratios  of  the  Combined  Poverty 

Estimates  for  1,142  Subareas  for  1968,  1969,  1970,  and  1971: 

County  of  Los  Angeles 


"Henry  S.  Shryock,  Jacob  S.  Siegel,  and  Associates,  The  Methods  and 
Materials  of  Demography,  Vol.  II,  U.S.  Bureau  of  the  Census;  Washington, 
D.C.,  U.S.  Government  Printing  Office,  1971,  pp.  757-758.  U.S.  Bureau  of 
the  Census,  Current  Population  Reports,  Series  P-25,  Population  Estimates 
and  Projections;  Washington,  D.C.,  U.S.  Government  Printing  Office. 

J0For  a  discussion  of  the  N  influencing  the  test  results  see  Hubert  M. 
Blalock,  Social  Statistics,  New  York,  McGraw-Hill,  1960,  pp.  226-228. 


Measure 

Basic  statistics 

1968 

1969 

1970 

1971 

Standard  deviation. . 

22.1 

11.6 

2.2 

64.1 

0.9 

0.1 

22.9 

11.4 

3.0 

94.4 

1.2 

1.7 

23.4 

11.5 

4.8 

74.0 

1.0 

0.9 

23.2 

11.6 

3.1 

75.6 

0.9 

0.3 

Critical  ratio  between — 

1969-1968 

1970-1969 

1971-1970 

1971-1968 

1.00 

No 

No 

1.20 

No 

No 

0.44 

No 

No 

2.33 

Yes 

No 

Significance  0.5*... 
Significance  .01*... 

*Two-tail    test. 

To  further  analyze  the  reliability  of  the  poverty  measures  over 
time,  interannual  correlations  were  run  to  measure  the  relative 
reliability.  Thus,  the  higher  the  correlation  value,  the  more  stable 
the  distribution  of  census  tract  values  are  relative  to  the  subsequent 
year  (or  years).  Comparing  the  years  separately,  the  following 
interannual  correlations  were  obtained  for  the  1,142  subareas  in 
Los  Angeles  County: 

1968  poverty  ->  1969  poverty  =  .887 

1969  poverty  -»  1970  poverty  =  .929 

1970  poverty  ->  1971  poverty  =  .934 

To  test  the  total  consistency  of  the  poverty  distributions  across 
the  4  years  of  the  estimate,  a  multiple  interannual  correlation  was 
obtained.  Here,  estimates  for  1968,  1969,  and  1970  were  the 
independent  variables,  and  the  1971  estimate  was  the  dependent 
variable.  The  multiple  coefficient  of  determination  was  .893  (R  = 
.945)  which  indicates  that  89.3  percent  of  the  variance  in  the  1971 
estimate  is  accounted  for  by  the  previous  three  points  in  time.  The 
standard  error  of  the  estimate  was  3.8.  The  distributional  pattern 
of  the  poverty  trend  across  time  is  shown  in  figure  6.  The  darkest 
areas  show  subareas  where  the  estimates  suggest  the  greatest  devi- 
ations. For  example,  only  11  of  the  1,142  subareas  showed  relative 
changes  exceeding  9  percent. 

The  above  statistics  would  seem  to  suggest  that  while  there  is 
some  change  in  poverty  status  among  the  different  areal  units 
across  the  4  years,  the  pattern  is  relatively  stable.  Thus,  even 
though  no  actual  comparison  of  the  estimate  of  poverty  to  an 
outside  source  was  possible,  there  are  indications  that  the  analy- 
tical procedures  produce  estimates  at  census  tract  levels  from  the 
benchmark  year  consistent  with  expectations  of  structural  pattern 
shifts  within  a  changing  metropolitan  area.  The  question  as  to  the 
fit  of  this  index  with  "true"  poverty  has  not  been  tested,  and  even 
though  such  tests  are  needed,  they  go  beyond  the  scope  of  this  paper. 
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INTERCENSAL  ESTIMATES 


The  Use  of  Driver  License 
Address  Change  Records  for 
Estimating  Interstate 
and  Intercounty  Migration 

W.  Nelson  Rasmussen 
California  Department  of  Finance 

INTRODUCTION 

The  demographer  who  has  chosen  population  estimating  as  a 
way  of  life  is  in  the  enviable  position  of  having  his  errors  discover- 
ed only  once  in  10  years.  On  the  other  hand,  he  lives  in  a  constant 
state  of  uncertainty  since  he  finds  out  if  what  he  has  been  doing  is 
right  only  every  10  years  also.  This  uncertainty  can  give  way  to 
genuine  uneasiness  when,  as  in  California,  the  numbers  which  he 
produces  are  used  in  a  wide  variety  of  fiscal  programs.  For 
example,  $460  million  is  distributed  annually  to  cities  and  counties 
directly  as  a  result  of  the  estimates  of  the  Population  Research 
Unit  in  the  Department  of  Finance.  In  addition,  transportation 
funds,  probation  subsidies,  and  property  tax  rates  all  are  affected 
by  population  estimates.  Even  the  number  of  liquor  licenses  issued 
for  bars  and  saloons  is  tied  to  the  increase  in  population. 

To  remove  some  of  the  uncertainty  surrounding  this  task,  the 
State  demographer  is  constantly  searching  for  new  and  better  data 
sources,  particularly  those  relating  to  migration.  Since  the  United 
States  does  not  have  a  universal  registration  system,  this  means  the 
search  is  for  symptomatic  indicators,  data  which  hopefully  give 
indication,  however  indirect,  of  population  movement. 

This  paper  is  a  preliminary  report  on  the  development  of  a 
migration  data  series  which,  potentially  at  least,  could  be  the  most 
timely,  comprehensive,  and  accurate  used  to  date-a  file  presenting 
recorded  address  changes  of  driver  license  holders.  Such  a  file  has 
been  developed  for  the  California  Department  of  Motor  Vehicles. 

The  file  is  of  particular  interest  to  population  researchers 
because  of  five  characteristics  uniquely  present  in  a  single  series.  In 
the  first  place,  the  processing  of  the  data  is  prompt,  an  attribute 
possessed  by  few  series  of  administrative  data.  For  example,  6 
months  or  more  elapse  before  complete  birth  or  death  records-by 
county  of  residence  of  the  mother  or  of  the  decedent,  respectively- 
for  California  are  available  from  State  sources,  and  the  reports 
from  Federal  sources  can  take  several  .years.  By  contrast,  the 
reported  changes  of  driver  addresses  are  available  to  the  researcher 
on  a  much  more  current  basis:  they  appear  monthly  within  2 
weeks  of  close-out.  Secondly,  the  series  presents  moves  into  and 
out  of  designated  areas,  unlike  school  statistics  from  which  only 
net  figures  can  be  derived.  This  feature  is  a  valuable  addition  since 
it  tells  the  researcher  what  additions  and  subtractions  are  associ- 
ated with  the  net  changes,  and  these  are  far  more  realistic  variables, 
presumably  more  closely  associated  with  economic  or  environ- 
mental indicators.  After  all,  in  the  real  world  there  is  no  such  thing 
as  a  "net  migrant."  A  third  advantage  is  the  composition  of  the 
file;  the  drivers  in  the  file  are  an  adult  group  largely  under  65  years 
of  age.  This  is  the  age  span  for  which  very  few  other  migration 
indicators  are  available.  The  data  are  further  subdivided  by  sex  and 
further  subdivisions  by  age  are  possible  at  the  option  of  the  user. 


The  fourth  advantage  lies  in  the  nature  of  the  material.  The  licenses 
are  part  of  a  huge  file  from  which  it  is  relatively  easy  to  sample  on 
the  basis  of  terminal  digits  or  other  sampling  schemes  which  can  be 
efficiently  applied  to  names  and  addresses  attached  to  registration 
numbers.  The  fifth  advantage  of  the  file  is  its  volume;  annually 
over  600,000  intercounty  moves  and  over  500,000  interstate 
moves  are  recorded.  All  of  the  supposed  advantages  of  this  file  are 
of  no  significance  to  population  estimates  if  it  cannot  do  one  thing: 
estimate  net  migration  more  accurately  than  currently  available 
methods. 

This  paper  will  describe  the  file,  its  history  and  characteristics. 
Particular  attention  will  be  given  to  the  various  efforts  to  establish 
the  validity  of  the  file  as  a  migration  estimating  tool.  A  brief 
description  of  the  current  methodology  for  using  the  file  to  esti- 
mate migration  will  be  given.  The  reader  is  warned  that  this  is  a 
preliminary  report,  perhaps  a  bit  premature,  but  since  this  unique 
file  has  attracted  much  interest  at  the  Census  Bureau  and  in  other 
States,  it  is  hoped  that  this  presentation  will  help  to  foster  the 
development  of  similar  files  elsewhere. 

THE  CALIFORNIA  DLAC  FILE 

Naturally  in  a  State  as  large  as  California  such  a  reporting  sys- 
tem is  possible  only  because  the  driver  license  records  became  fully 
automated  in  1969.  The  driver  license  address  change  (DLAC)  file 
is  established  by  selecting  all  records  which  show  a  change  in 
county  code  between  the  new  and  old  license.  Several  transactions 
can  result  in  a  change  of  address  report:  A  license  renewal  showing 
different  addresses  for  old  and  new,  a  change  of  address  notifica- 
tion form,  a  new  license  issued  to  a  holder  of  a  valid  license  in 
another  State  or  country,  or  a  duplicate  license  issued  to  replace  a 
lost  one  or  because  of  a  name  change,  again  showing  different  old 
and  new  addresses.  Identification  cards  issued  by  the  Department 
of  Motor  Vehicles  (DMV)  are  also  included.  These  records  are  tabu- 
lated for  a  month  and  then  a  report  is  issued,  as  shown  in  exhibit  1 . 
The  report  now  provides  data  on  five  age  groups,  each  by  sex.  It 
also  shows  the  complete  matrix  of  moves  between  all  counties  plus 
out-of-State  as  one  category.  While  the  regular  report  shows  only 
one-half  of  the  matrix  (that  is,  moves  out  of  a  particular  county  to 
each  of  the  other  counties)  annually-and  more  frequently  on 
special  request,  reports  are  issued  showing  the  other  half  of  the 
matrix  and  a  net  change  tabulation.  Recently  a  separate  set  of 
reports  has  been  designed  showing  the  State  of  origin  or  desti- 
nation of  all  movers  between  California  and  other  States.  (See 
exhibit  2.) 

The  development  and  preparation  of  these  reports  for  the  Popu- 
lation Research  Unit  was  possible  only  because  of  the  cooperative 
attitude  of  the  Department  of  Motor  Vehicles'  Division  of  Driver 
Licenses,  particularly  the  EDP  Issuance  Unit.  Their  willingness  to 
develop  and  modify  the  program  has  shown  a  complete  lack  of  the 
jurisdictional  rivalry  often  observed  between  State  agencies.  I 
suspect  that  getting  the  cooperation  of  the  agency  controlling 
driver  licenses  may  be  one  of  the  biggest  obstacles  to  researchers 
desiring  to  establish  a  similar  system  in  their  State. 

As  these  records  began  to  accumulate,  we  were  interested  in  their 
possible  significance.  Certain  questions  about  the  data  immediately 
came  to  mind.  Since  a  California  driver  license  is  normally  valid  for 
4  years,  how  diligent  are  persons  about  reporting  an  address  change 
to  the  Department  of  Motor  Vehicles  other  than  when  renewing  a 
license?  Furthermore,  how  prompt  are  those  who  do  send  in  a 
change  of  address;  that  is,  how  soon  after  the  move  do  they  report 
it?  Arc  there  differences  in  reporting  between  movers  because  of 
distance  moved,  age  and  sex,  and  socioeconomic  status?  Even  if 
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Exhibit  1.    Monthly  County  Migration  Report  for  Movers  From  Old  County  to  New  County:    County  of  Alameda 


New   county 


Alpine 

Amador 

Butte 

Calavaras 

Colusa 

Contra  Costa. . . 

Del  Norte 

El    Dorado 

Fresno 

Glenn 

Humboldt 

Imperial 

Inyo 

Kern 

Kings 

Lake 

Lassen 

Los   Angeles. .  . . 

Madera 

Marin 

Mariposa 

Mendocino 

Merced 

Modoc 

Mono 

Monterey 

Napa 

Nevada 

Orange 

Placer , 

Plumas , 

Riverside , 

Sacramento , 

San   Benito 

San   Bernardino., 

San   Diego 

San   Francisco... 

San  Joaquin 

San    Luis   Obispo. 

San   Mateo 

Santa   Barbara... 

Santa   Clara 

Santa   Cruz 

Shasta 

Sierra 

Siskiyou 

Solano 

Sonoma 

Stanislaus 

Sutter 

Tehama 

Trinity 

Tulare 

Tuolumne 

Ventura 

Yolo 

Yuba 

Out   of   State... . 


Under   25   years 


Male        Female 


3 

5 
42 
11 

3 
488 

2 
57 
67 

4 
39 

6 

3 
20 
13 

6 

7 
317 

4 
58 

3 
12 
26 

1 


Totals    for  Alameda. 


Represents    zero. 


5 
44 

3 

2 
587 

2 
29 
62 

4 
37 

1 

13 

8 
10 

3 
267 

2 
56 

1 
20 
19 


36 

40 

24 

18 

13 

4 

86 

75 

31 

32 

23 

9 

28 

22 

84 

116 

2 

- 

33 

30 

134 

117 

214 

209 

58 

66 

17 

25 

113 

164 

25 

31 

357 

425 

56 

65 

22 

20 

- 

2 

4 

3 

50 

52 

94 

84 

36 

36 

6 

6 

4 

3 

- 

2 

4 

15 

16 

12 

21 

33 

43 

53 

13 

12 

856 

677 

700 

3,633 

25-29 


Male        Female 


5 
37 
4 
3 
796 
2 
35 
58 
4 
54 
6 
2 
16 
11 
10 
8 
429 
2 
118 
3 
35 
13 
1 
1 
56 
28 
11 
102 
40 
11 
22 
161 
2 
35 
130 
426 
76 
24 
201 
38 
524 
64 
36 
3 
9 
71 
102 
30 
4 
4 
1 
7 
9 
32 
55 
6 
1,365 

5,338 


4 
38 

7 

2 
780 

4 
27 
41 

3 
40 

4 

2 
11 
13 
11 

5 

367 

10 

106 

4 
28 


49 
32 
11 
90 
17 
7 
10 
135 

24 

146 

338 

69 

15 

167 

35 

404 

47 

25 

1 

5 

51 

73 

43 

6 

1 

1 

14 

9 

18 

36 

6 

,040 


30-44 


Male        Female 


1 
14 
27 
16 
3 
1,293 
2 
56 
80 
5 
40 
4 
1 
29 
10 
16 
5 
500 
13 
112 
5 
47 
30 
2 
1 
57 
51 
19 
150 
41 
8 
37 
188 
3 
35 
186 
482 
115 
14 
293 
41 
633 
62 
40 

5 

104 

126 

65 

8 

13 

6 

23 

10 

49 

43 

8 

,853 


2 

7 
27 

8 

1 
932 

3 
39 
72 

2 
30 

4 

2 
25 
11 
12 

2 
320 

8 

103 

3 

45 

19 

1 

37 

21 

21 

118 

27 

7 

27 

147 

3 

36 

167 

198 

107 

9 

155 

23 

363 

42 

30 

1 

4 

72 

85 

57 

3 

11 

3 

12 

11 

27 

38 

7 

1,278 


45-64 


Male        Female 


4,442         7,080         4,825 


2 
24 
53 
20 

5 
707 

4 
55 
50 

6 
14 

1 

14 

3 

53 

6 

190 

11 

46 

7 

21 

18 

3 

1 

41 

34 

28 

64 

47 

7 

23 

108 

17 

77 

124 

100 

6 

135 

18 

229 

50 

38 

2 

13 

62 

83 

51 

6 

7 

5 

13 

25 

15 

24 

6 

841 


27 
51 
17 

3 
585 

4 
44 
45 

7 
16 

2 

12 

1 
49 

4 
141 

9 
39 

5 
19 
17 

2 

30 
38 
21 
50 
35 
5 
23 
87 

14 

72 

61 

73 

9 

101 

10 

190 

42 

23 

3 

10 

34 

74 

44 

3 

13 

4 

12 

21 

11 

18 

8 

648 


65   and   over 


Male        Female 


3,613         2,886 


10 

20 

8 

2 

183 

16 
12 

1 

7 


7 
1 

20 
1 

39 
3 
9 
2 
4 
7 


14 

13 

8 

9 

18 

2 

11 

22 

4 
23 
24 
24 

2 
20 

2 
36 
16 

4 


14 

31 

25 

2 

6 


11 

4 

3 

3 

180 

891 


2 

4 
7 
4 

164 


2 
1 

12 
1 

28 


17 
20 

2 
10 

9 

9 

17 

3 

6 
12 
18 

11 

6 

45 

15 

5 

1 
11 
27 
12 

1 

1 

4 
5 


617 


Total 
male 


6 
58 
179 
59 
16 
3,467 
10 
219 
267 
20 
154 
17 
6 
86 
38 
105 
27 
1,475 
33 
343 
20 
119 
94 
7 
3 
204 
150 
79 
411 
177 
51 
121 
563 
7 
124 
550 
1,270 
373 
63 
762 
124 
1,779 
248 
140 
5 
31 
301 
436 
207 
26 
34 
12 
55 
71 
121 
168 
36 
5,095 

20,622 


Total 
female 


4 
47 
167 
39 
8 
3,048 
13 
147 
227 
16 
123 
12 
4 
63 
34 
94 
15 
1,123 
29 
307 
13 
115 
65 
4 
1 
173 
129 
59 
343 
120 
28 
91 
502 
3 
107 
508 
818 
333 
58 
598 
105 
1,427 
211 
103 
7 
23 
220 
343 
192 
19 
29 
10 
57 
58 
97 
150 
33 
3,731 

16,403 


Total, 
both 
sexes 


10 
105 
346 
98 
24 
6,515 
23 
366 
494 
36 
277 
29 
10 
149 
72 
199 
42 
2,598 
62 
650 
33 
234 
159 
11 
4 
377 
279 
138 
754 
297 
79 
212 
1,065 
10 
231 
1,058 
2,088 
706 
121 
1,360 
229 
3,206 
459 
243 
12 
54 
521 
779 
399 
45 
63 
22 
112 
129 
218 
318 
69 
8,826 

37,025 


the  records  were  accurate  for  driver  license  holders,  can  it  be 
assumed  that  non-license-holders  have  the  same  mobility  patterns? 
To  answer  all  of  the  questions  fully  would  require  a  large  research 
project  involving  extensive  use  of  direct  questionnaires  to  driver 
license  holders.  We  are  formulating  a  proposal  for  such  a  project 
now.  In  the  meantime  some  efforts  to  test  the  usefulness  of  the  file 
have  been  made,  including  comparisons  with  other  administrative 
data  and  a  sample  survey  conducted  for  the  Population  Research 
Unit. 


In  February  1974,  the  Population  Research  Unit  contracted 
with  the  Field  Research  Corporation  to  ask  three  questions  on  its 
regular  quarterly  sample  survey  of  California.  As  you  can  see  in 
exhibit  3,  the  key  question  had  to  do  with  the  current  status  of  the 
address  of  driver  license  holders.  By  relating  the  question  to  gaso- 
line rationing,  we  hoped  to  avoid  some  of  the  bias  normally 
expected  when  you  ask  people  to  admit  to  a  possible  violation  of 
regulations.  The  other  two  questions,  which  were  asked  later  in  a 
different   part   of   the    interview   to   avoid   response   bias,  were 
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Exhibit  2.    Yearly  State  Migration  Report  for  Movers  From  Old  California  County  to  New  State:    State  of  Alabama 


Old   county 


Under   25   years 


Male 


Female 


25-29 


Male 


Female 


30-44 


Male 


Female 


45-64 


Male 


Female 


65  and  over 


Male 


Female 


Total 
male 


Total 
female 


Alameda 

Alpine 

Amador 

Butte 

Calavaras 

Colusa 

Contra   Costa 

Del  Norte 

El   Dorado . 

Fresno 

Glenn 

Humboldt 

Imperial 

Inyo 

Kern 

Kings. . . .  . 

Lake 

Lassen 

Los   Angeles 

Madera 

Marin 

Mariposa. 

Mendocino. ............ 

Me  re  ed 

Modoc 

Mono 

Monterey. 

Napa 

Nevada 

Orange 

Placer. 

Plumas 

Riverside 

S  ac  ramento 

San   Benito 

San   Bernardino 

San   Diego. 

San   Francisco. 

San  Joaquin 

San   Luis   Obispo 

San  Mateo 

Santa  Barbara 

Santa  Clara 

Santa  Cruz 

Shasta 

Sierra 

Siskiyou 

Solano 

Sonoma 

Stanislaus 

Sutter 

Tehama 

Trinity 

Tulare 

Tuolumne 

Ventura 

Yolo 

Yuba 

Totals  for  Alabama 


47 
1 


16 
2 

5 
9 

3 

18 

2 

3 

3 

4 
9 

1 


2 

2 

1 

160 


34 


10 

1 


6 

11 

2 

2 
2 
1 
5 


1 
1 
4 
1 
4 

123 


12 


4 
2 

6 

24 
3 
1 
1 

1 
4 

12 


1 
5 
1 
1 

164 


45 


14 
5 
2 

2 

7 
9 


135 


1  1 


121 
5 


28 
2 

5 
12 

13 

24 

7 

2 

3 
13 
20 

1 
1 


12 
1 

1 

330 


85 
2 
4 


31 


10 
18 

11 

18 

5 

4 

3 
9 
15 
1 
1 


79 
2 
2 


13 


4 

7 

5 

17 

4 

3 

2 

1 
7 
2 


3 

1 

189 


3  4 


46 
2 


6 

1 

121 


41 


22 


17 
2 
1 

71 
5 

19 
31 

30 

87 

19 

10 

1 

10 

22 

49 

4 

1 


16 

10 

4 

2 

1 

2 
1 
24 
2 
4 

884 


2  9 


3 

2 

1 

1 

1 

1 

1 

- 

18 

15 

3 

1 

9 

7 

3 

4 

4 

4 

1 

1 

26 

14 

- 

3 

1 

- 

1 

- 

316 

219 

3 

4 

9 

4 

10 

1 

63 
2 

23 
36 

31 
53 

15 
8 
3 
9 
17 
37 
2 
1 


14 
5 


2 

1 

19 

3 

7 

678 


Represents  zero. 


designed  to  provide  migration  data  which  could  be  cross-tabulated 
with  the  responses. 

A  primary  consideration  in  any  survey  is  the  representativeness 
of  the  sample.  The  field  survey  is  designed  to  provide  an  unbiased, 
representative  sample  of  the  adult  population  in  the  households, 
but  in  addition  it  would  appear  the  sample  was  reasonably  con- 
sistent with  the  1970  census  mobility  patterns  as  s<een  in  table  1. 
But  the  question  essential  to  the  survey  is  how  accurate  were  the 
responses  with  respect  to  the  reporting  of  address  change?  Since  we 


could  not  make  a  record  check,  the  exact  accuracy  could  not  be 
determined.  However,  the  interviewer  was  instructed  to  suggest  to 
the  informant  that  he  might  want  to  check  his  driver  license,  and  it 
is  possible  to  examine  the  differences  in  response  between  those 
who  checked  and  those  who  did  not.  The  comparison,  shown  in 
table  2,  reveals  potentially  significant  differences  between  the  two 
groups  in  some  responses  although  the  small  number  who  checked 
their  licenses  limits  the  applicability  of  the  data.  Nevertheless,  it 
does  appear  that,  among  those  who  checked  licenses,  far  fewer 
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report  their  address  as  "current",  that  is,  the  same  as  on  their 
license.  However,  there  was  almost  no  difference  between  the  two 
groups  as  to  the  proportion  who  reported  that  their  record  was  not 
current  and  that  they  had  not  yet  notified  DMV  of  their  new 
address.  This  may  indicate  the  size  of  the  real  parameter,  or  it  may 
show  merely  that  the  understatement  of  noncompliance  is  inde- 
pendent of  whether  they  checked  their  license  or  not. 

Exhibit  3.   Questions  Asked  by  Field  Research  Corporation, 
February  1974 


How  long  have  you  lived  at  this  address? 


Less  than  6  months 1 

6  to  11.9  months 2 

1  to  1.9  years 3 

2  to  4  years 4 

More  than  4  years 5 


IF  4  YEARS  OR  LESS,  ASK: 

2.  Where  did  you  live  before  you  lived  at  this  address? 

AT  ANOTHER  ADDRESS  IN  THIS  CITY 1 

IN  ANOTHER  CITY,  BUT  IN  THIS  COUNTY 2 

IN  ANOTHER  CALIFORNIA  COUNTY 3 

IN  ANOTHER  U.S.  STATE 4 

IN  A  FOREIGN  COUNTRY 5 

3.  As  you  may  know,  the  Federal  Government  is  considering  a  system  of  ra- 
tioning gasoline.   One  method  that  has  been  proposed  is  to  distribute 
gas  ration  coupons  to  driver's  license  holders  over  18  using  the 
address  on  file  with  the  Department  of  Motor  Vehicles.   To  the  best  of 
your  knowledge,  which  of  the  statements  on  this  card  (HAND  EXHIBIT 
CARD)  describes  the  current  status  of  your  driver's  license  address. 
You  can  just  give  me  the  number  of  the  statement  that  fits  your  answer. 

1.  The  address  printed  on  the  front  of  my  license  is  my  current 
address. 

2.  The  address  printed  on  the  front  of  my  license  is  not  my  current 
address,  but  I  have  notified  the  Department  of  Motor  Vehicles  of 
my  new  address. 

3.  The  address  printed  on  the  front  of  my  license  is  not  my  current 
address  and  I  have  not  yet  notified  the  department  of  the  change. 

4.  I  have  no  driver's  license. 

INTERVIEWER  PROBE:   "Would  you  like  to  take  the  time  to  check  your 
license  just  to  be  sure?" 


INDICATE  HERE  IF  RESPONDENT  CHECKED  LICENSE  OR  NOT: 


CHECKED  LICENSE 

DID  NOT  CHECK  LICENSE. 


D 
D 


With  these  precautionary  considerations  regarding  the  validity 
of  the  sample,  let  us  look  at  some  of  the  more  interesting  results. 
As  might  be  expected,  the  length  of  time  a  person  had  lived  at  his 
address  was  strongly  associated  with  the  reporting  of  change  of 
address;  almost  50  percent  of  the  recent  movers  had  not  reported, 
whereas  less  than  10  percent  of  those  who  had  lived  at  their 
address  between  2  and  4  years  did  report.  (See  table  3.)  Con- 
versely, from  a  positive  viewpoint,  the  data  suggest  that  within  a 
year  after  a  move  85  percent  of  the  drivers  indicated  that  the 
department  had  a  record  of  their  current  license  as  a  result  of  some 
transaction,  renewal,  change  of  address  notice,  issuance  of  dupli- 
cate license,  etc.  Incidentally,  traffic  citations  are  not  used  in  the 
change  of  address  file. 

Age  is  also  a  factor  as  might  be  expected,  with  those  under  45 
years  having  a  noncompliance  rate  about  twice  that  of  the  45-and- 
older  group,  even  if  consideration  is  given  to  the  more  frequent 
migration  of  the  younger  group.  Since  the  under-45  and  particular- 
ly the  under-30  group  make  up  the  majority  of  the  migration,  these 
findings  could  have  serious  implications. 

There  is  clearly  much  more  to  be  learned  about  the  relation  of 
each  individual  move  to  the  recording  of  it  in  the  driver  license 
address  change  report.  Additional  study  will  require  the  compari- 
son of  questionnaire  responses  to  the  status  of  records  on  the  file. 


Table  1.    Mobility  in  Sample  Compared  With  Mobility  in  1970  Census 

(Percent) 


Sample1 

Census2 

Mobility 

Total 

Driver 

licenses 

only 

46.9 

22.9 

13.2 

9.3 

4.8 
1.4 
1.4 

1,179 

45.6 

23.2 
13.7 
9.9 
4.6 
1.3 
1.6 

1,041 

44.0 

Movers : 

33.0 

10.8 
5.6 

2.6 

5.6 

(NA) 

NA  Not  available. 

1Movers  are  persons  18  years  and  over  who  have  lived  at 
current  address  less  than  4  years. 

2Movers  are  persons  15  years  and  over  who  did  not  live  at 
same  address  in  1965  (5  years  earlier) . 

Source:   U.S.  Bureau  of  the  Census,  1970  Census  of  Population 
Volume  PC(1)-C6,  California,  pp.   6-382,  and  unpublished 
Fieldscope  Report,  February  1974. 


Table  2.   Status  of  Driver  License  Address  Record  for  Movers 
by  Whether  Checked  License  During  Interview 

(percent) 


Status   of   address   record 

Whether  license  checked 

Current 

Not   current 

Notified 

Not 
notified 

36.8 
55.7 

40.4 
21.9 

21.1 

(N=83) 

22.4 

(N=953) 

Source:   Unpublished  Fieldscope  Report.  February  1974. 

Table  3.  Drivers  Not  Reporting  Change  of  Address  by  Time  at 
Current  Address 

(Percent) 


Lived   at   current    address — 

Address    change 
not    reported 

44  .5 

28.3 
15  .6 

7   2 

0  .8 

Source:   Unpublished  Fieldscope  Report, 
February  1974. 

The  indication  from  the  survey  that  as  many  as  75  to  85  percent 
of  movers  have  updated  their  records  within  a  year  of  a  move 
seems  to  be  substantiated  by  comparing  the  intercounty  movers 
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from  the  DLAC  file  with  those  from  a  match  of  1  percent  of  the 
Federal  income  tax  returns.  The  match  of  Federal  tax  returns  was 
made  by  selecting  1  percent  of  the  returns  on  the  basis  of  Social 
Security  numbers  for  1  year  and  attempting  to  match  that  sample 
with  a  similar  sample  selected  in  the  next  year.  The  comparisons 
presented  here  are  limited  to  the  32  largest  counties  since  the 
1 -percent  sample  does  not  yield  sufficient  numbers  in  the  smaller 
ones. 

Based  on  the  cumulative  migration  for  3  years,  there  appears  to 
be  an  extremely  high  degree  of  association  between  the  two  migra- 
tions measures.  (See  table  4.)  Similarly  a  scatter  diagram  showing 
the  comparative  numbers  for  net  migration  reveals  only  some 
dispersion  but  still  indicates  a  strong  linear  relationship  (figure  1). 
Since  in  a  3-year  cumulative  comparison,  there  may  be  some 
compensating  error,  a  1-year  comparison  was  also  tested.  For  this 
test  a  third  set  of  numbers  was  introduced,  an  annual  tabulation  of 
automobile  registration  address  changes  between  counties  recorded 
during  the  registration  renewal  period  for  automobiles.  (See  table 
5.)  Again  the  correlation  is  very  high,  apparently  indicating  that  if 
driver  license  address  changes  are  not  immediately  recorded,  a 
sufficient  number  are  updated  within  a  reasonable  period  of  time 
such  that  the  migration  is  quite  consistent  with  other  data  which 
require  an  annual  update.  This  is  not  to  imply,  of  course,  that  any 
of  these  measures  is  the  same  as  actual  migration. 

Table  4.  Correlation  Between  Driver  License  Address  Changes  and 

Federal  Tax  Returns  for  Intercounty  Migration, 

1970-1973  Cumulative 


Migration 

Correlation 

In 

0.9965 

Out 

.9987 

Net 

.9954 

Source:   Unpublished  records  from  the 
Department  of  Motor  Vehicles  and  the 
Franchise  Tax  Board. 


Table  5.  Correlation  of  Driver  License  Address  Changes  with 
Federal  Tax  Returns  and  Auto  Registrations 
for  Intercounty  Migration,  1972-1973 


Migration 

Tax   returns 

Auto 
registrations 

In 

0.9953 
.9973 
.9878 

0.9873 

Out 

.9988 

Net • • 

.9903 

Source:   Unpublished  records  from  the 
Department  of  Motor  Vehicles  and  the 
franchise  Tax  Board. 

The  most  recent  item  relating  to  the  validity  of  the  file  was  the 
result  of  an  expansion  of  the  tabulation  program  itself.  In  April 
1974,  we  were  able  to  obtain  for  the  first  time  a  tabulation  show- 
ing the  State  of  previous  driver  license  for  new  California  license 
holders  as  well  as  the  tabulation  by  new  State  of  former  Cali- 
fornians  whose  licenses  had  been  returned.  (Sec  table  6.)  The  data 


revealed  some  gaps  in  the  reporting  system,  a  few  States  which 
quite  obviously  were  not  returning  records  to  the  California  DMV. 

Table  6.   Driver  License  Address  Changes  Between  California 
and  Other  States:   July  1,  1973,  through  March  31,  1974 


Total,  excluding 

unknown . . 

Unknown. ......... 


Grand  total. 


Alabama. . . . 

Alaska 

Arizona. 
Arkansas. . . 
California. 


Colorado. 

Connecticut*. » 

Delaware. ■■■«>• 

District   of  Columbia*. 
Florida. 


Georgia.  • 
Hawaii. . . 
Idaho*. .  „ 
Illinois. 
Indiana.  . 


Iowa* ... • . 
Kansas.  ■ .. 
Kentucky. . 
Louisiana. 
Maine 


Maryland. 

Massachusetts* 
Michigan. 
Minnesota. . •  • . 
Mississippi. .. 


Missouri 

Montana. 

Nebraska., 

Nevada. ..■••••• 
New  Hampshire.. 

New  Jersey. . . . . 

New  Mexico 

New  York. 

North  Carolina. 
North  Dakota. . . 


Ohio*, 

Oklahoma. •  •  • .  • 

Oregon. 

Pennsylvania*. 
Rhode  Island.. 


South  Carolina*. 
South   Dakota.... 

Tennessee 

Texas 

Utah 


Vermont* 

Virginia 

Washington. . . . 
West  Virginia. 

Wisconsin 

Wyoming*. •  •  •  •  • 


Driver  license  address  changes 


Into 
Calif. 


184,957 
23,421 


1,492 

804 

7,930 

1,488 


6,531 

3,119 

294 

522 

6,832 

2,189 
4,431 
1,396 
12,974 
3,670 

2,619 
2,727 

944 
2,802 

525 

3,376 
4,857 
8,548 
3,954 
1,101 

4,474 
1,050 
1,885 
3,569 
588 

6,890 

3,001 

15,862 

1,880 

612 

8,694 
3,273 
5,184 
6,731 
889 

863 

703 

1,700 

12,000 

3,233 

261 
3,460 
8,318 

525 
3,462 

725 


Out   of 
Calif. 


173,280 
2,835 

176,115 

1,562 

1,603 

15,858 

4,486 


9,521 

109 

324 

55 

6,623 

2,408 
4,816 
130 
5,986 
2,935 

1,932 
2,609 
1,226 
2,602 
576 

2,610 
180 
5,760 
2,465 
1,093 

3,329 

1,845 

1,659 

11,483 

528 

2,511 
3,682 
6,656 
2,277 
365 


214 

619 

2,040 

5,584 

6,267 

12 

4,256 

13,313 

664 

2,274 

28 


11,677 
20,586 

32,263 

-70 

-799 

-7,928 

-2,998 

(X) 

-2,990 

3,010 

-30 

467 

209 

-219 

-385 

1,266 

6,988 

73  5 

687 
118 
-282 
200 
-51 

766 
4,677 
2,788 
1,489 


1,145 

-795 

226 

-7,914 

60 

4,379 
-681 

9,206 

-397 

247 


1,878 

6,816 

4,780 

-1,507 

18,873 

-13,689 

182 

6,549 

492 

397 

649 

84 

-340 

6,416 

-3,034 

249 

-796 

-4,995 

-139 

1,188 

697 


Adjusted 
out  of 
Calif. 


X 

X 

899 
X 
X 

X 

X 
X 

X 

x 

X 

50  J 
X 
X 
X 

X 
X 
X 
X 
X 

X 

X 
X 
X 
X 

4,998 
(X 
X 
52 


Adjusted 
net 


X 

x 

X 
X 
X 

X 

09! 
X 
14 

X 

X 
X 
503 
X 
X 

X 

x 

X 
X 
X 

x 

53 
X 

X 
X 

X 
X 
X 

X 
X 

X 
X 
X 
X 

X 

96 
X 

(x 

3,279 


99 

X 

X 

x 

X 

X 
X 
X 
X 
X 
339 


t"0ut   of   Calif."   not    reported   because  of   Inaccurate   data — see   text. 
■    Represents    zero.  X  Not    applicable. 


As  a  result  of  this  tabulation,  the  California  DMV  has  written  to  its 
sister  agencies  and  the  affected  States  seeking  their  cooperation  in 
returning  records.  The  tabulation  was  still  extremely  valuable 
because  a  known  error  is  far  better  than  an  unknown  one.  Since 
the  understatement  of  out-migrants  appears  to  affect  only  a  few 
States,  we  were  able  to  compensate  for  the  missing  number  by 
assuming  the  same  ratio  of  in-migrants  to  out-migrants  for  those 
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Figure  1.  Comparison  of  Intercounty  Net  Migration  for  Selected  California  Counties  as  Derived  from  Driver  License  Address  Changes 

and  Federal  Tax  Returns 
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States  as  reported  for  the  1 965-70  period  in  the  census.  (See  table 
6.)  I  will  not  take  the  time  to  discuss  some  of  the  interesting 
patterns  revealed  by  the  data  except  to  observe  that  Governor 
McCall  of  Oregon,  whose  slogan  is  "You're  welcome  to  visit  but 
please  don't  stay,"  probably  would  not  be  pleased  to  know  the 
extent  to  which  Californians  are  relocating  there. 


From  our  various  efforts  to  date  we  have  come  to  some  tenta- 
tive conclusions  regarding  the  driver  license  address  change  report. 
The  DLAC  report  appears  to  be  consistent  with  other  records 
which  have  a  mandatory  annual  updating  such  as  Federal  income 
tax  returns  and  auto  registrations.  On  the  basis  of  responses  to  a 
direct  question  a  little  more  than  20  percent  of  the  persons  who 


22 


INTERCENSAL  ESTIMATES 


had  moved  within  the  last  4  years  have  not  had  this  current  address 
recorded.  The  percentage  is  much  higher  for  those  who  have  lived 
at  their  current  address  less  than  6  months;  nevertheless,  85  per- 
cent of  the  movers  have  brought  their  record  up-to-date  after  a 
year.  Finally,  we  learned  that  not  all  States  are  returning  the 
licenses  of  former  Californians  thus  causing  the  number  of  out- 
migrants  to  be  understated.  Since  the  understatement  is  small  in 
relation  to  the  total,  this  error  can  be  compensated  for. 

As  a  result  of  the  various  checks  that  we  have  been  able  to 
make,  perhaps  in  spite  of  them,  we  decided,  or  rather  I  decided,  to 
put  the  blame  where  it  belongs,  that  the  time  had  come,  somehow, 
to  incorporate  the  data  into  our  regular  county  population  esti- 
mating program.  While  there  is  much  more  to  be  learned  about  the 
data,  they  certainly  are  no  worse  than  many  of  the  indicators  now 
used  to  estimate  migration.  The  methodology  for  using  the  data  is 
relatively  straight  forward,  making  use  of  a  variety  of  other  data 
sources  that  are  easily  obtainable.  For  those  acquainted  with  the 
term,  it  is  a  component  estimating  method  since  migration  is 
directly  estimated  and  use  is  made  of  recorded  vital  statistics  and 
military  changes.  However,  I  have  called  it  "composite  migration 
method"  since,  as  in  the  Bogue-Duncan  composite  method,  dif- 
ferent age  groups  are  estimated  by  different  techniques  except  that 
this  methodology  estimates  migration  rather  than  total  population. 

It  seemed  advisable  to  use  the  DLAC  file  to  estimate  only  the 
migration  of  the  18-to-64  age  group  since  the  participation  rate  for 
driver  licenses  falls  off  sharply  below  and  beyond  this  age  span.  In 
addition  those  under  18  who  do  have  licenses  are  probably  very 
poor  at  reporting  address  changes.  Therefore,  the  migration  of 
those  under  18  is  derived  from  the  Census  Bureau  Method  II  pro- 
cedure which  uses  school  enrollment.  The  estimate  for  those  65 
and  over  is  based  on  Medicare  statistics. 

The  estimates  of  migration  of  the  18-to-64  group  from  the 
driver  license  data  are  done  by  first  assuming  a  2  months  lag  in  the 
recorded  statistics.  This  lag  merely  reflects  the  probable  minimum 
time  for  address  changes  to  be  processed.  The  mean  lag  for  all 
movers  undoubtedly  is  longer,  but  it  cannot  be  pinpointed  at  this 
time.  The  reported  number  of  out-migrants  is  adjusted  for  the 
States  not  returning  licenses,  an  adjustment  of  about  8  percent  to 
the  total  number  of  out-migrants.  The  adjusted  net  driver  license 
address  changes  is  then  expanded  to  represent  the  moves  of  all 
persons  18  to  64  years  on  the  basis  of  1970  data  and  the  field 
survey  which  show  that  about  12  percent  of  the  population  in  this 
age  group  do  not  have  driver  licenses. 

An  examination  of  a  sample  study  of  the  origin  of  persons 
obtaining  a  California  license  for  the  first  time  revealed  that, 
except  for  Canadians,  very  few  of  the  new  applicants  from  a 
foreign  country  turned  in  a  license.  Therefore,  immigration  from 


abroad  of  persons  18  years  and  over  has  added  to  the  driver  license 
migrants.  This  is  particularly  important  in  California  which  annual- 
ly receives  70,000  to  80,000  immigrants.  Since  current  immigra- 
tion is  not  reported  by  age  for  States,  the  proportion  1 8  years  and 
over  is  based  on  the  1970  census  public  use  sample  for  immigrants 
admitted  1965-70.  The  1 8-years-and-over  group  is  used  since  new 
aliens  are  not  eligible  for  Medicare.  The  distribution  by  county  of 
the  immigrants  is  also  based  on  the  1970  census  data. 

The  various  procedures  used  in  preparing  a  migration  estimate 
for  counties  are  summarized  in  exhibit  4.  The  total  population 
estimates  use  recorded  data  on  births  and  deaths  and  military 
changes. 

Exhibit  4.   Steps  in  Estimating  County  Migration  of  Population  Under 
65  Years  of  Age 

A.  Migration  under  18  years:   Use  Census  Bureau  Method  II 

B.  Migration  18-64  years  and  18  years  and  over  for  immigrants 

1)  18-64  from  United  States  and  Canada 

a.  In-migrants  (from  DLAC) 

b.  Out-migrants  (from  DLAC) 

c.  b.   adjusted  for  nonreporting  States-(b  x  1.078) 

d.  Net  DLAC  migration  (a-c) 

e.  Estimated  total  migration-(d  x  1.12) 

2)  18  years  and  over  from  abroad 

a.  Total  (Immigration  and  Naturalization  Service  Annual  Report) 

b.  Less  immigration  from  Canada  (Ibid.) 

c.  Estimated  immigrants  18  years  and  over  (b  x  approximately  .61) 

d.  Emigrants  (c  x  .10  assumed) 

e.  Net  immigration  18  years  and  over  (c-d) 

3)  Total  migration  =  l)e  +  2)e 

C.  Total  migration  under  65  years  (A  +  B) 

CONCLUSION 

The  first  opportunities  to  test  this  estimating  procedure  against 
a  census  will  be  coming  in  the  next  few  months.  The  Department 
of  Finance  will  be  conducting  censuses  in  about  20  counties  in  the 
next  2  years.  Needless  to  say,  we  are  eagerly  awaiting  the  oppor- 
tunity to  get  a  more  definitive  picture  of  the  accuracy  of  the  driver 
license  address  change  reports  for  migration  estimates. 
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Estimating  Population  for 
Census  Tracts 

Peter  K.  Francese 

National  Planning  Data  Corporation,  Ithaca,  N.  Y. 

INTRODUCTION 

By  the  fall  of  1972,  it  became  apparent  that  there  would  be  no 
mid-decade  census.  Without  any  prospect  for  national  census  fig- 
ures until  1980,  there  was  a  clear  need  for  census  tract  population 
estimates.  This  paper  describes  the  efforts  of  National  Planning 
Data  Corporation  to  estimate,  on  an  annual  basis,  the  population 
of  each  of  the  approximately  32,000  census  tracts  in  243  standard 
metropolitan  statistical  areas  (SMSA's).  Our  estimates  are  as  of 
April  1st  of  each  year,  facilitating  comparison  with  the  decennial 
census  date. 

Census  tracts  nest  within  counties  and  there  are  several  sources 
for  annual  estimates  of  county  population.1  However,  before 
1972,  no  one  had  attempted  to  estimate  annually,  on  a  national 
basis,  the  population  of  areas  as  small  as  census  tracts.  There  are 
several  reasons  why.  Indicators  of  population  change  such  as  births, 
deaths,  school  enrollment,  and  economic  activity,  which  have  been 
used  for  county  estimates,  are  not  available  for  censusjxacts.  Even 
if  they  were  available,  it  is  doubtful  that  these  indicators  of  popula- 
tion change  would  be  applicable  to  such  small  areas;  individual 
household  records  are  preferable. 

DATA  BASE  DEVELOPMENT 

Many  excellent  local  estimates  of  census  tract  population  rely 
on  building  permit  data  and  utility  connection  data.  Due  to  the 
multiplicity  of  local  governments,  these  data  are  not  obtainable  on 
a  uniform  national  basis.  Tract  estimates  using  local  data  are  hardly 
ever  available  for  an  entire  SMSA. 

There  were  four  categories  of  information  obtainable  for  census 
tracts  nationally  which  are  useful  in  estimating  population.  Data 
from  these  four  categories— 1960-to-1 970  population  change, 
population  density,  1970  census  demographic  characteristics,  and 
telephone/auto  statistics— were  transformed  into  approximately  70 
independent  variables  to  be  used  in  a  regression  program.  The 
following  paragraphs  describe  these  four  categories  in  more  detail. 

1.  Population  change  from  1960  to  1910}  The  past  record 
of  population  change  is  often  used  to  predict  future  growth  or 
decline.  Many  population  projections  have  been  made  by  simply 
extrapolating  a  past  trend.3  Since  1960  population  totals  for 
1970  census  tracts  were  not  published,  it  was  necessary  for  our 
staff  to  develop  this  information. 


'The  Federal-State  Cooperative  Program  for  Population  Estimates  of  the 
Bureau  of  the  Census  (reference  15)  is  one  source.  There  are  several  private 
companies  which  estimate  county  population. 

2  A  more  detailed  explanation  of  how  this  data  was  obtained  is  covered  in 
reference  6. 

3 This  is  a  naive  method  and  is  used  only  when  no  other  data  are 
available.  For  our  use,  a  past  trend  was  included  as  one  of  the  many 
variables. 


These  data  were  obtained  by  comparing  1960  census  tract 
maps  with  1970  census  tract  maps  and  quantifying  the  differen- 
ces due  to  any  boundary  changes.  Once  the  relationship  be- 
tween the  1960  and  1970  tracts  had  been  determined,  a  1960 
population  was  assigned  to  every  1970  census  tract.  The  result- 
ant trend  data  for  small  areas  may  be  useful  for  other  research 
purposes. 

2.  Population  density  and  land  area.4  A  census  tract  is 
usually  a  small  area  and  rarely  exceeds  several  thousand  acres, 
though  the  largest  are  more  than  300  square  miles  in  area.  A 
phenomenon  that  has  been  observed  is  that  a  census  tract 
which  has  grown  in  population  during  the  past  several  years  may 
now  be  "filled"  with  dwelling  units  and  no  longer  have  the 
capacity  for  additional  growth.  This  has  been  described  as 
achieving  the  tract's  maximum  "fill  density."  As  a  result,  it  was 
felt  that  the  population  density  of  tracts  would  be  essential  for 
making  population  estimates.  Using  an  electronic  planimeter, 
our  staff  obtained  the  total  area,  the  land  area,  the  water  area 
and,  where  possible,  the  area  of  land  not  available  for  residential 
purposes  (e.g.,  parks,  industrial  development,  and  governmental 
or  institutional  facilities). 

Census  tracts  can  be  classified  on  the  basis  of  density.  A  very 
low  density  tract  is  usually  in  a  rural  area,  while  a  high  density 
tract  is  urban.  Different  variables  affect  population  change  in 
urban  and  rural  areas.  We  found  that  stratifying  census  tracts  on 
the  basis  of  density  yielded  significantly  improved  population 
estimates,  whereas  arbitrary  stratification  on  the  basis  of  in- 
clusion in  the  urbanized  area  did  not  significantly  improve  the 
estimates.  Stratification  on  several  other  variables  (such  as 
income)  was  tried  and  yielded  poor  results.  Although  we  have 
not  pursued  this  area  of  research,  we  believe  that  population 
density  can  be  a  key  ingredient  in  short-range  (up  to  5  years) 
population  projections  for  small  areas. 

3.  7970  Census  of  Population  and  Housing  Data.  One  de- 
termination which  was  made  was  that  for  small  areas  there  are 
reasonably  consistent  and  measurable  relationships  between 
certain  demographic  characteristics  and  the  rate  of  population 
change.  For  example,  old  and  deteriorating  neighborhoods  tend 
to  lose  population  while  newer  suburban  areas  grow.  This  idea 
can  be  quantified  by  examining  such  characteristics  as  housing 
type  (rental,  owned,  single  family,  etc.),  housing  value,  age  and 
condition,  income,  family  structure,  and  employment. 

The  1970  census  population  and  housing  characteristics  were 
obtained  from  the  Bureau  of  the  Census  Second-  and  Fourth- 
Count  Summary  Tapes.  Where  possible,  100-percent  informa- 
tion was  taken  from  the  Second  Count  and  supplemented  with 
the  less  accurate  20-,  15-,  and  5-percent  sample  data  from  the 
Fourth  Count.  A  group  of  approximately  40  indicators  of  popu- 
lation change  was  developed  from  the  1970  census  information. 

4.  Postcensal  indicators:  telephone  and  auto  data.  One  of 
the  reasons  that  population  estimates  for  census  tracts  have 
been  so  difficult  is  the  absence  of  postcensal  indicators.  The 
recent  development  of  the  address  coding  guide  and  address 
matching  computer  programs  has  partially  solved  this  problem. 
For  the  first  time,  data  on  individual  households  can  be  coded 
to  the  census  tract  level  and  used  as  indicators  of  population 
change. 

The  indicators  chosen  for  use  in  our  estimation  procedure 
were  the  annual  count  of  households  having  a  listed  telephone 
number  or  a  registered  automobile,  or  both.5  These  counts  were 

4  A  more  detailed  explanation  of  how  these  data  were  obtained  is  covered 
in  reference  6. 

5  These     figures     were     obtained     from     the     Reuben     H.     Donnelley 
Corporation. 
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obtained   by   census  tract,   coding  telephone  and  automobile 
mailing  lists  and  counting  nonduplicated  addresses. 

The  telephone  lists,  obtained  from  phone  directories,  are  up- 
dated annually.  Two  problems  which  impair  the  effectiveness  of 
this  data  are  that  telephone  directories  are  not  updated  uni- 
formly and  that  the  percentage  of  persons  with  unlisted  phone 
numbers  is  not  stable,  either  from  place  to  place  or  from  year  to 
year.  The  counts  of  registered  automobiles  which  are  obtained 
from  State  departments  of  motor  vehicles  are  quite  accurate  in 
most  places.  However,  rented  cars  do  not  appear  and  no  motor 
vehicle  data  is  available  for  Connecticut  or  Oklahoma.  Even 
with  these  shortcomings,  we  feel  these  data  are  better  than  any- 
thing else  currently  available. 

At  this  time,  only  about  60  percent  of  all  census  tracts  are 
address  coded.  Therefore,  our  use  of  telephone  and  auto  statis- 
tics is  restricted  to  those  census  tracts.  As  the  address  coding 
guide  is  extended  to  all  census  tracts,  we  expect  our  estimates  to 
improve  accordingly. 

ESTIMATING  TECHNIQUE 

The  information  previously  described— 1960  population,  1970 
population  and  housing  characteristics,  population  density,  and 
telephone  and  auto  households— was  used  to  establish,  through 
regression  analysis,  the  population  growth  rate,  which  we  call  q. 

This  growth  rate  was  entered  in  the  estimating  equation  below. 

P        =P  +(q)P 
b+n      b  b 

where 

P(j       =  population  at  the  base  time,  in  this  case,  1970 

=  population  n  years  after  the  base  time 

=  growth  rate 


Vn 


The  estimated  population  was  at  all  times  the  population  in 
households.  The  count  of  persons  in  group  quarters  or  institutions 
was  assumed  to  be  constant. 


In  the  reality  of  budgetary  and  time  considerations,  we  obtained 
the  best  local  estimates  of  as  many  census  tracts  as  possible.  At  the 
present  time,  we  have  over  4,000  test  tracts.  Many  of  these  local 
estimates  were  very  carefully  prepared  and  yield  accurate  popula- 
tion figures. 


Regression  Analysis 

The  regression  program  used  was  BMD02R.  Comparisons  were 
made  with  the  results  from  a  double  precision  SPSS  regression 
program,  but  the  results  were  not  different  enough  to  justify  using 
the  more  complex  program.  Prior  to  entering  variables  in  the  pro- 
gram, the  test  tracts  were  partitioned  into  two  groups:  those  with 
telephone  and  auto  data  and  those  without.  Tracts  within  these 
groups  were  then  stratified  according  to  population  density.  The 
table  at  the  end  of  this  paper  displays  some  of  the  statistical  infor- 
mation about  the  equations  we  used  for  our  1973  estimates. 

Since  census  tracts  nest  within  counties,  our  census  tract  esti- 
mates were  summed  to  the  county  level  and  checked  against 
independently  obtained  county  estimates.6  If  there  were  more 
than  a  5-percent  unexplained  divergence  between  the  two  esti- 
mates, our  tract  totals  were  adjusted. 


CONCLUSION 


This  paper  has  described  our  research  into  the  problem  of  esti- 
mating population  for  census  tracts.  During  the  past  two  years,  we 
have  concentrated  on  developing  a  data  base  and  improving  our 
estimates.  By  1975,  we  expect  to  have  better  and  more  complete 
telephone  and  auto  statistics.  Also,  upcoming  special  censuses  will 
provide  more  test  data. 

We  expect  to  explore  other  statistical  methods  and  other  data 
sources.  It  is  our  goal  to  publish  an  estimate  of  census  tract  popula- 
tion every  year  until  the  next  census. 


Test  Data 

Ideally,  we  would  have  randomly  chosen  a  set  of  test  census 
tracts  and  conducted  a  complete  count  of  the  population  in  each. 


6  Federal-State  Cooperative  Program  estimates  were  used,  where  available, 
and  supplemented  with  county  estimates  obtained  from  Market  Statistics, 
Inc.,  of  New  York,  N.Y. 
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Equation  Information  for  1973  Census  Tract  Estimate  File 


Number 

Correla- 

Standard 

Mean 

growth 

rate 

Standard 

Equation 
number 

Density 

of 

test 

cases 

F  ratio 

tion 
coeffic- 
ient 

error 

of 

estimate 

deviation 

of   growth 

rate 

01 

Low 

152 

6.536 

.5873 

.07797 

.07621 

.10999 

02 

65 

4.353 

.7  934 

.07820 

.07531 

.12541 

03 

66 

3.211 

.5118 

.08607 

.05215 

.10695 

04 

70 

3.695 

.5471 

.07922 

.02040 

.10219 

05 

80 

6.927 

.7918 

.04256 

.02419 

.07494 

06 

106 

2.958 

.5646 

.06395 

.01142 

.08081 

07 

105 

4.108 

.5521 

.04140 

-.00184 

.05426 

08 

113 

5.878 

.7106 

.05281 

.00365 

.08170 

09 

112 

4.063 

.5415 

.03594 

.00459 

.04673 

10 

97 

3.532 

.6595 

.03200 

.00538 

.04408 

11 

82 

7.821 

.8135 

.02346 

-.01112 

.04353 

12 

109 

4.802 

.5912 

.03844 

-.00482 

.05270 

13 

129 

7.968 

.5659 

.03073 

-.00053 

.04324 

14 

121 

6.904 

.5942 

.02545 

.00312 

.03629 

15 

74 

4.885 

.6782 

.02050 

-o 001 97 

.03020 

16 

\ 

t 

186 

51.443 

.8548 

.02359 

.00321 

.05864 

17 

High 

104 

10.006 

.6642 

.01338 

.00219 

.02110 

18 

Low 

86 

5.202 

.7394 

.10698 

.08527 

„16857 

19 

83 

4.380 

.7371 

.05989 

.02335 

.09120 

20 

69 

12.796 

.8596 

.04570 

.05136 

.10029 

21 

91 

3.359 

.64  95 

.07521 

.03451 

.10199 

22 

113 

3.784 

.5916 

.09109 

.04934 

.12121 

23 

114 

3.535 

.5828 

.06774 

.02635 

.0887  9 

24 

92 

3.851 

.5203 

.08682 

.03218 

.11073 

25 

110 

6.628 

.7503 

.05457 

.03385 

.09058 

26 

139 

6.717 

.5733 

.07677 

.02873 

.10729 

27 

273 

7.987 

.5068 

.05398 

.01788 

.07235 

28 

> 

! 

138 

4.657 

.5766 

.04837 

.00049 

.06539 

29 

High 

93 

6.123 

.5812 

.03559 

-.00540 

,04966 

Note:   Equations  18-29  used  telephone  and  auto  data 
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Expansion  of  the  Component  Method 
to  Estimates  of  Population  Subgroups 
in  Neighborhoods 

Gangu  K.  Ahuja 

Government  of  the  District  of  Columbia 

INTRODUCTION 

The  purpose  of  this  paper  is  to  demonstrate  that  the  Com- 
ponent Method  can  be  used  satisfactorily  to  estimate  net  migration 
and  to  estimate  population  not  only  for  the  total  population  but 
also  for  small  age  groups  by  sex  and  color. 

This  method  is  particularly  useful  as  an  alternative  to  the  Com- 
posite Method  when  vital  statistics  data  are  not  readily  available 
but  school  enrollment  data  are.  Besides,  in  the  Composite  Method 
the  dramatic  decline  in  births  in  the  recent  past,  partially  due  to 
liberalization  of  abortion  laws  but  probably  due  mainly  to  ad- 
vances in  contraception,  has  made  it  rather  difficult  to  prepare 
reliable  estimates  of  female  population  in  the  child-bearing  age. 

To  our  knowledge,  Component  Method  II  presently  is  being 
used  to  estimate  only  total  population;  this  paper  shows  how  its 
use  can  be  extended  to  small  age  groups  by  sex  and  color.  We  note 
that  recently  the  Census  Bureau  independently  worked  on  extend- 
ed use  of  this  method  and  prepared  satisfactory  population  esti- 
mates by  age  groups  for  internal  use  only. 

COMPONENT  METHOD 

The  Component  Method  essentially  consists  of  combining  the 
last  census  population  with  natural  increase  and  net  migration  for 
the  period  from  the  last  census  up  to  the  current  year: 

P     =   P     +  B  -  D  -  M 
e  c 

where 

Pe  =  estimated  population  for  the  current  year 

Pc  =  population  in  the  last  census 

B  =  births  from  the  last  census  up  to  the  current  year 
D  =  deaths  from  the  last  census  up  to  the  current  year 
M  =  migration  from  the  last  census  up  to  the  current  year 

The  required  annual  data  on  births  and  deaths  are  available  from 
local  agencies,  whereas  migration  data  are  not;  the  migration  for 
the  current  year  is  calculated  by  assuming  that  the  migration  rate 
of  the  total  population  equals  that  of  the  school  age  population. 

EXPANDED  COMPONENT  METHOD 

When  this  method  is  extended  to  small  age  groups  which  are  sex 
and  color  specific,  satisfactory  estimates  are  obtained  by  surviving 
the  base  year  population  in  each  group  and  adjusting  for  migration. 
The  base  year  population  is  survived  by  carrying  forward  5-year  age 


groups  as  enumerated  in  the  last  census  for  5  years  using  life  table 
survival  rates.  The  effect  of  5-year  mortality  is  expressed  by  the 

survival  ratio  -^ —  for  each  group;  to  illustrate  the  female  5LX 


5L25 
5L20 


yields 


population  ages  20  to  24  in  1 970  times  the  survival  ratio' 
an  estimate  of  the  age  group  of  women  25-29  in  1975. 

For  population  estimates  at  higher  ages,  the  following  equation 
can  be  used: 


=    p    -  y2  (d 

a  a 


+  D   ^ 

a+5 


t+5 

P      _ 
a+5 

where 

Pl  =  population  at  the  beginning 

pt+5       =  population  at  the  end 
D  =  deaths 

M  =  migration 

a  =  age 

Similarly,  population  under  1  year: 
t+1 

P  =      B    -    f   D 

o  o   o 


i/2  (M. 


Ma+5> 


M 


where  f    refers  to  the  separation  factor  for  infant  deaths. 

The  Component  Method  calls  for  estimating  net  migration  for 
the  cohort  of  school  age  population  by  comparing  a  current  esti- 
mate of  school  age  population  with  the  expected  number  derived 
from  the  last  census.  This  migration  for  the  school  age  population 
is  then  converted  into  a  rate,  from  which  a  rate  of  migration  for 
the  total  population  is  estimated. 

Specifically,  it  involves  the  following  steps: 

1.  Calculate  the  actual  population  of  elementary  school  age 
by  adjusting  the  elementary  grade  enrollment  for  (1)  age-grade 
relationship  and  (2)  percent  of  the  school  age  population  in 
schools. 

2.  Calculate  the  expected  population  of  elementary  school 
age  on  the  estimate  date  by  surviving  the  population  in  the  same 
cohort  at  the  time  of  the  last  census. 

3.  Compute  migration  rate,  using  population  in  the  same  age 
cohort  at  the  time  of  the  last  census. 

The  expanded  Component  Method  calls  for  migration  estimates 
for  small  age  groups  by  sex  and  color  for  the  base  year  as  well  as 
the  year  of  estimate;  for  example,  using  the  data  from  the  1970 
census  reports  for  place  of  residence  during  1965-70,  the  migration 
rate  for  persons  below  5  years  can  be  calculated  as  follows: 


Mijk(1970)   = 


MIijk(1965-70)    -  MOijk (1965-70) 


ijk(1970) 


where 
M 


Mj 

M0 
P 

J 


=  net  migration  rate 

=  in-migration 

=  out-migration 

=  population 

=  age 

=  sex 

=  color 


The  base  year  migration  rate  for  ages  below  5  years  can  be 
calculated  using  the  1970  population  adjusted  for  births  and  deaths 
occuring  during  the  period  of  1965-70: 
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U 


<5jk(1970) 


P<5jk(1970)        [B<5jk  (1965-70)    "   D<5jk  (1965-70)^] 


<5jk(1970) 


where 

M  =  migration 

P  =  population 

B  =  births 

D  =  deaths 

j  =  sex 

k  =  color 

The  age  specific  migration  rate  for  the  current  year  is  calcualted 
in  the  same  fashion  as  in  the  Census  Component  Method,  i.e.,  it  is 
assumed  that  the  migration  rate  in  any  given  age  group  is  the  same 
as  in  the  school  age  population: 


M 


M 


ijk(1970) 


ijkl 


x     M 


M 


(5-14)jkl 


(5-14)jk(1970) 


where 

M       =  migration  rate 

i         =  age 

j         =  sex 

k        =  color 

I         =  year  of  estimate 

Application  of  Expanded  Component  Method 
to  D.C.  Population  Estimates 

Annual  population  estimates  by  age,  sex,  and  color  were  pre- 
pared for  D.C,  both  at  city  level  and  at  census  tract  level,  based  on 
the  Composite  Method.  Because  of  the  dramatic  decline  in  births  in 
recent  years,  last  year  (1972)  there  was  a  problem  in  producing 
estimates  of  females  in  the  child-bearing  age.  The  total  D.C.  popu- 
lation estimates  also  did  not  compare  with  independent  estimates 
prepared  by  the  Census  Bureau.  An  attempt  was  made  to  prepare 
these  estimates  based  on  the  expanded  Component  Method.  This 


method  produced  total  estimates  which  were  comparable  with 
those  released  by  the  Census  Bureau.  The  estimates  in  the  1544 
age  group  were  much  higher  than  were  obtained  using  the  com- 
posite method;  however,  all  other  broad  age  groups  by  sex  and 
color  compared  very  well  in  both  methods.  The  comparison  of 
estimates  based  on  Composite  and  Component  Methods  for  D.C, 
1971-72,  is  shown  in  tables  1  and  2. 

Minor  Modification  in  Census  Component  Method  to  D.C. 

Coverage  of  school  series.— To  measure  the  change  in  school  age 
population  in  D.C,  the  coverage  was  restricted  to  5-14  years.  This 
was  done  for  two  reasons: 

1.  To  make  this  group  comparable  to  the  one  in  the  Com- 
posite Method,  and 

2.  To  make  attendance  virtually  complete. 

The  public  and  nonpublic  school  enrollment  data  for  D.C.  resi- 
dents by  grade,  sex,  and  color  were  available  from  the  D.C.  Board 
of  Education.  Data  for  only  grades  K-9  were  used,  with  special 
ungraded  classes  on  the  elementary  level  being  included.  In  1970, 
there  were  significant  differences  in  the  enrollment  data  from  the 
census  and  the  Board  of  Education;  these  differences  were  noticed 
both  in  private  and  public  school  enrollment  particularly  in  grades 
K-9.  Using  the  ratio  method,  corrections  were  applied  to  1971  and 
1972  D.C.  Board  of  Education  data  before  school  age  population 
estimates  were  made. 

Survivors  and  life  table.-Because  of  the  unavailability  of  a  1 970 
Life  Table  for  D.C,  the  1960  Life  Table  was  used  to  survive  the 
1970  population,  assuming  that  specific  death  rates  have  not 
changed  significantly  in  the  last  decade.  However,  this  was  not  true 
for  the  below-5-years  age  group,  since  the  infant  mortality  rate  for 
this  group  has  been  substantially  reduced;  therefore,  the  survivors 
were  related  to  current  mortality  data  and  to  the  estimated  proba- 
bility of  death  for  the  below-1  -year  and  the  1  -to-4  years  age  groups. 
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Table  1.    Estimated  Population  in  1971  by  Age,  Sex,  and  Color  Based  on  Composite  and  Component  Methods:    District  of  Columbia 


Composite 

Component 

Age  years 

Total 

White 

Nonwhlte 

Total 

White 

Nonwhite 

Male 

Female 

Male 

Female 

Male 

Female 

Male 

Female 

Total 

753,600 

59,900 
65,100 
64,900 
66,300 
77,700 
64,400 
47,400 
42,600 
42,300 
40,600 
44,500 
36,500 
33,400 
68,000 

130,000 
340,700 
155,000 

90,900 

3,800 

3,200 

3,200 

6,600 

12,500 

10,300 

6,800 

5,300 

5,200 

5,000 

5,700 

5,100 

4,900 

13,300 

6,400 
46  ,  700 
20,700 

110,000 

4,200 
2,900 
3,400 
7,700 

13,600 
9,400 
5,800 
4,600 
5,300 
8,500 
5,200 
7,400 
8,600 

23,400 

6,300 
46,400 
29 , 700 

262,500 

26,300 
29,800 
29,400 
23,900 
21,800 
20,500 
16,400 
15,100 
14,500 
12,000 
20,500 
11,500 
8,400 
12,400 

59,200 

112,200 

52,400 

290,200 

25,600 
29,200 
28,900 
28,100 
29,800 
24,200 
18,400 
17,600 
17,300 
15,100 
13,100 
12,500 
11,500 
18,900 

58,100 

135,400 

52,200 

753,100 

59,000 
65,100 
64,900 
62,500 
81,100 
65,500 
48,800 
41,700 
41,900 

|   82,500 

V   69,800 
70,300 

130,000 
341,500 
152,300 

89,400 

3,400 
3,200 
3,200 
4,700 
13,000 
9,900 
6,500 
5,100 
5,000 

10,400 

11,800 
13,200 

6,400 
44,200 
22,200 

110,000 

3,300 
2,900 
3,400 
5,500 
14,600 
9,800 
5,600 
4,200 
4,900 

13,200 

16,800 
25,800 

6,300 
44,600 
30,000 

261,700 

26,900 
29,800 
29,400 
25,400 
23,000 
21,100 
17,500 
15,100 
15,000 

26,900 

18,800 
12,800 

59,200 

117,100 

45,700 

292 ,000 

25,400 

29,200 

28,900 

20  to  24  years 

26,900 

25  to  29  years 

30,500 

24,700 

35  to  39  years 

19,200 

17,300 

17,000 

32,000 

22,400 

18,500 

58,100 

135,600 

54,400 

Table  2.    Estimated  Population  in  1972  by  Age,  Sex,  and  Color  Based  on  Composite  and  Component  Methods:    District  of  Columbia 


Composite 

Component 

Age  years 

Total 

White 

Nonwhite 

Total 

White 

Nonwhite 

Male 

Female 

Male 

Female 

Ma  .e 

Female 

Male 

Female 

Total 

731,100 

61,700 
61,300 
63,700 
64,500 
72 , 000 
63,600 
45,400 
39,100 
35,800 
44,100 
41,000 
35,900 
32,300 
70 , 700 

125,000 
320,400 
153,300 

82,100 

3,100 

2,700 

2,900 

3,900 

10,200 

10,100 

7,200 

5,400 

4,000 

4,800 

4,600 

5,300 

5,600 

12,300 

5,600 
40,800 
20,300 

102,300 

2,900 
2,400 
2,800 
4,600 

11,600 
9,400 
6,300 
4,800 
4,300 
6,000 
5,900 
7,700 
8,000 

25,600 

5,200 
41,000 
27,600 

257,000 

28,100 
28,100 
29,300 
26,100 
21,600 
20,600 
15,200 
13,400 
12,800 
15,200 
14,300 
10,400 
8,600 
13,300 

57,400 

109,700 

48,500 

289,700 

27,600 
28,100 
28,700 
29,900 
28,600 
23,500 
16,700 
15,500 
14,700 
18,100 
16,200 
12,500 
10,100 
19,500 

56 ,  800 

128,900 

56,900 

752,500 

61,000 
61,200 
63,500 
60,600 
80,400 
68,100 
50,800 
41,600 
40 , 700 

I   84,200 

|   69,100 
71  ,300 

124,700 
342,200 
153,300 

85,100 

3,200 
2,700 
2,900 
3,600 
12,200 
10,100 
6,800 
4,900 
4,800 

9,400 

11,200 
13,300 

5,600 
42 , 400 
20,600 

103,800 

3,100 
2,400 
2,800 
3,700 
14,600 
10,000 
5,700 
4,100 
4,600 

11,500 

15,800 
25,500 

5,200 
42 , 700 
27,300 

264,700 

27,600 
28,100 
29,300 
26,100 
22,900 
21,800 
18,100 
15,200 
14,500 

28,800 

18,900 
13,400 

57,400 

118,600 

47,700 

298,900 

27,100 
28,000 
28,500 
27,200 
30,700 
26,200 
20,200 
17,400 
16,800 

35  to  39  years 

34,500 

23,200 

19,100 

56,500 

138,500 

57,700 

Microdata 
Bases: 

Public  Data  Files 

for 

Socioeconomic 

Research 
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Introduction 

John  C.  Beresford 
Data  Use  and  Access  Laboratories,  Arlington,  Va. 


To  solve  problems  at  the  local  or  national  level,  people  are 
forced  to  make  choices.  I  believe  that  if  information  collected  at 
public  expense  is  made  accessible  and  useful  it  will  increase  the  rate 
of  economic  and  social  return  on  the  public's  investment  in  data 
and  help  problem-solvers  make  choices. 

The  purpose  of  these  papers  is  to  inform  members  of  the  associ- 
ation about  public  data  files  that  could  be  of  great  use  to  statisti- 
cians whose  work  involves  support  of  the  decisionmaking  process 
that  guides  policy  and  administration.  There  are  many  public  data 
files  stored  on  computer  tape  and  accessible  to  statisticians.  State 
and  local  governments  generate  such  files  but  have  not,  as  yet, 
made  conscious  efforts  to  stimulate  their  use  outside  the  origi- 
nating agencies.  The  Federal  Government  is  the  source  of  most  of 
the  public  data  files  available  to  all  statisticians  at  this  time. 

One  of  the  ways  we  can  learn  about  the  data  files  available  from 
the  government  is  through  the  Directory  of  Computerized  Data 
Files  and  Related  Software,  1974,  published  by  National  Technical 
Information  Service.  Although  the  introduction  to  the  Directory 
states  that  "data  files  are  available  only  in  summary  form,"  in  fact 
many  files  containing  unique  units  of  analysis  are  listed.  However, 
not  all  files  available  from  the  government  are  included  in  this  first 
edition.  In  the  papers  that  follow,  for  example,  you  will  read  about 
the  Current  Population  Survey  files  and  the  revenue  sharing  data 
files  (both  those  for  formula  computation  and  those  for  analysis  of 
revenue  use).  The  existence  of  the  NTIS  catalog  and  the  knowledge 
that  there  are  yet  other  files  not  listed  in  it  indicates  that  there  is  a 
large  storehouse  of  public  data.  Part  of  the  reason  for  this  session  is 
to  help  draw  attention  to  this  fact  and  to  suggest  uses  for  a  few  of 
these  public  files. 

Although  the  decennial  census  public  data  files  are  rich  in 
small-area  detail,  the  geographic  identifiers  in  most  public  data  files 
are  not.  In  general,  however,  many  kinds  of  subnational  totals  are 


available  from  public  data  files. 

The  first  two  papers  presented  here  deal  with  public  data  related 
to  revenue  sharing.  The  Office  of  Revenue  Sharing  uses  data  files 
created  by  the  Bureau  of  the  Census  and  described  by  Steven  M. 
Rudolph  of  the  Bureau.  Many  agencies  and  students  of  revenue 
sharing  activities  use  data  on  planned  and  actual  uses  of  revenue 
supplied  by  the  Office  of  Revenue  Sharing.  Arthur  L.  Hauser  de- 
scribes these  files  and  the  ORS  data  program.  In  a  note  following 
these  papers,  other  revenue  sharing  data  files  are  described.  These 
were  prepared  for  the  National  Science  Foundation  for  use  in  the 
NSF-sponsored  program  in  Research  Applied  to  National  Needs 
titled  "Alternative  Formulae  for  General  Revenue  Sharing."  Trudi 
Lucas  of  NSF  and  Lawrence  L.  Brown,  III,  of  DUALabs  describe 
the  files  and  program. 

The  third  paper  deals  with  an  important  set  of  data  files  (which 
are  listed  in  the  NTIS  catalog),  the  Continuous  Work  History 
Sample  of  the  Social  Security  Administration.  In  his  paper,  David 
A.  Hirschberg  notes  the  emergence  of  a  new  10-percent  sample 
from  this  source.  Mirschberg's  regular  function  at  SSA  is  liaison 
with  outside  users  m  he  writes  with  authority  for  those  of  us  who 
may  plan  to  use  the  files.  Kathryn  P.  Nelson  of  the  Oak  Ridge 
National  Laboratories  is  one  who  has  used  these  data  extensively. 
Her  comments  on  the  use  of  the  files  follows  Hirschberg's  paper. 

The  final  paper,  by  Paul  T.  Zeisset  and  Larry  W.  Carbaugh  of 
the  Data  Users  Services  Division  at  the  Bureau  of  the  Census,  pro- 
vides valuable  information  about  a  well-known  general  purpose 
population  characteristics  file  which  has  only  recently  been  re- 
leased for  public  use:  the  Current  Population  Survey  file.  Some 
earlier  versions  of  the  file  had  been  used  by  special  arrangement 
with  the  Bureau,  thus  allowing  Harold  Beebout  to  gain  experience 
he  shares  with  us  in  his  notes  following  the  Zeisset-Carbaugh 
paper. 
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PUBLIC  DATA  FILES 


Census  Data  for 
General  Revenue  Sharing 

Steven  M.  Rudolph 
Bureau  of  the  Census 


Data  item 


Initial  source 


Updated 


INTRODUCTION 

The  State  and  Local  Fiscal  Assistance  Act  of  1972  (general 
revenue  sharing)  was  enacted  by  Congress  and  signed  into  law  by 
President  Nixon  on  October  20,  1972.  The  act  was  designed  to 
give  financial  aid  to  State  and  local  governments  which  were  find- 
ing it  increasingly  difficult  to  pay  for  the  services  they  provided. 
One  unique  aspect  of  this  piece  of  legislation  is  that  the  funding 
has  been  authorized  for  the  entire  program;  it  is  not  necessary  that 
Congress  make  annual  appropriations.  A  trust  fund  was  establish- 
ed to  distribute  approximately  $30  billion  in  the  5-year  period 
from  January  1972  to  December  1976.  This  gives  the  recipient 
governments  confidence  that  the  funds  are  available  and  en- 
courages, longer  ranging  plans  to  expend  the  funds.  The  recipient 
governments  may  use  these  funds  without  restrictions,  as  if  they 
were  locally  generated,  to  finance  expenditures  in  the  nine  broadly 
defined  categories  spelled  out  in  the  revenue  sharing  act. 

ELIGIBLE  UNITS  OF  GOVERNMENT 

Another  unique  aspect  of  the  program  is  that  the  Federal  Gov- 
ernment distributes  the  funds  directly  (on  a  formula  basis)  to  all 
eligible  units  of  government.  All  other  formula  grant  programs 
include  only  States  and/or  the  very  largest  of  local  governments. 
The  legislation  specifies  that  counties,  municipalities,  townships, 
Indian  tribes,  and  Alaskan  native  villages  as  well  as  the  State  gov- 
ernments are  to  receive  general  revenue  sharing  funds.  Parishes, 
boroughs,  villages,  and  other  general  purpose  governments,  as  they 
are  included  in  one  of  the  above  categories,  also  are  eligible.  If  a 
government  exists  organizationally  but  is  inactive,  i.e.,  provides  no 
services,  has  no  officers,  and  raises  no  revenues,  it  is  not  classified 
as  an  existing  government  and  would  not  be  eligible  for  general 
revenue  sharing.  These  determinations  and  categories  are  the  same 
as  those  used  by  the  Bureau  of  the  Census  for  general  statistical 
purposes.  There  are  approximately  38,800  general  purpose  govern- 
ments and  500  Indian  tribes  and  Alaskan  native  villages. 

THE  DATA 

Although  the  Census  Bureau  supplies  nearly  all  of  the  data  ele- 
ments, it  has  no  responsibility  or  input  regarding  the  distribution 
of  the  funds  or  the  application  of  the  formulas.  In  fact,  the  Secre- 
tary of  the  Treasury  is  given  the  authority  by  the  legislation  to  use 
alternate  or  modified  sources  of  data  if  he  determines  that  a  more 
equitable  distribution  will  result.  A  table  displaying  the  data  items 
used  in  the  distribution  formulas  and  the  sources  follows: 


A.  Total  population  1970  census 
(State  level) 

B.  Urbanized  area  popu-  1970  census 
lation  (State  level) 

C.  Population  (local  1970  census 
level) 

D.  Income  (State  level)  Bureau  of  Economic 

Analysis  (BEA) 

E.  Income  (local  level)  1970  census 

F.  Taxes  and  tax  effort  FY  1971  governmental 
(total  State  and  finance  series* 
local) 

G.  Taxes  and  intergov-  Revenue  sharing 
ernmental  transfers  survey* 
(local  level) 

H.   State  income  tax  Congressional  source 

I.   Federal  income  tax  Congressional  source 
collections  by  State 


Annual ly--popul at ion 

estimates* 

Not  updated 

1973  population  esti- 
mates (available  2/75)" 
Annually  by  BEA 

1973  income  estimates 
(available  2/75)*1 

Annually 


Annually* 


Annually  by  Census 
Bureau 


^Supplied  by  the  U.S.  Bureau  of  the  Census. 

'The  decision  to  apply  the  local-level  income  and  population  will  be 
made  by  the  Department  of  the  Treasury  in  the  near  future. 

The  legislation  specifies  that  the  data  elements  used  in  the  distri- 
bution formulas  shall  be  those  defined  by  the  Bureau  of  the  Census 
for  general  statistical  purposes.  Two  specific  exceptions  to  the 
Bureau's  statistical  classification  systems  were  written  into  the  Act: 
the  "Memphis  Rule,"  which  adjusts  for  a  county  sales  tax  distri- 
buted to  cities;  and  the  "education  tax  adjustment,"  which  ex- 
cludes any  tax  revenues  generated  for  schools  from  the  tax  factors 
used  in  the  local  distribution.  Aside  from  the  two  exceptions 
noted,  it  is  the  Bureau's  function  to  provide  the  data  as  they 
objectively  fit  the  longstanding  definitional  systems  which  it 
applies  in  its  ongoing  statistical  programs. 

IMPACT  ON  BUREAU  PROGRAMS 

Since  mid-1971  when  the  Congress  began  formulating  a  general 
revenue  sharing  program,  the  Census  Bureau  has  been  called  upon 
to  provide  extensive  services  and  a  variety  of  alternative  data  sets 
for  consideration.  Throughout  the  past  3  years  these  requests,  now 
channeled  through  the  Office  of  Revenue  Sharing  of  the  Depart- 
ment of  the  Treasury,  have  had  significant  impact  on  ongoing 
Bureau  programs.  In  the  area  of  governmental  finances,  the  data 
requirements  of  the  act  have  led  to  the  collection  of  tax  revenue 
data  on  an  annual  basis  from  all  general  purpose  local  governments 
recognized  by  the  Census  Bureau.  Our  early  experience  indicates 
that  the  yearly  collection  of  these  data  has  been  characterized  by 
improvement  in  the  information  supplied  by  the  localities  as  well 
as  improvements  in  the  Bureau's  collection  and  processing  of  these 
data. 

With  regard  to  the  other  major  census  data  elements-popula- 
tion counts  and  measures  of  per  capita  income— the  general  revenue 
sharing  program  has  encouraged  a  broadening  and  refining  of  our 
work  in  estimating  postcensal  population  counts  at  the  county  and 
subcounty  level.  Earlier  this  year,  the  Bureau  completed  pro- 
visional estimates  of  population  and  per  capita  income  for  calendar 
years  1972  and  1971,  respectively,  for  all  States  and  counties  in 
the  Nation.  By  early  1975,  we  will  have  completed  similar  post- 
censal estimates  for  all  units  of  local  government.  These  data  will 
be  made  available  to  the  Office  of  Revenue  Sharing  for  their  con- 
sideration for  application  in  the  computation  of  the  general  reve- 
nue sharing  allocations.  The  Census  Bureau  is  proceeding  with  its 
plans  to  publish  these  postcensal  population  estimates  as  part  of  its 
Current  Population  Report  series. 

The  Census  Bureau's  participation  in  the  general  revenue  sharing 
program  as  a  data  supplier  has  stimulated  our  own  programmatic 
work  and  our  visibility  to  the  recipient  governments  throughout 
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the  Nation.  I  feel  that  if  the  general  revenue  sharing  program  is 
renewed  after  1977,  these  types  of  data  needs  that  have  surfaced 
will  be  given  strong  consideration  as  we  plan  for  the  1980  census. 


RELEVANT  COMPUTER  SUMMARY  TAPE  FILES 


containing  this  information  for  Entitlement  Periods  1,  2,  3,  and  4. 
Once  we  have  converted  this  file  from  its  present  UNIVAC- 
oriented  format  to  an  industry-compatible  user  format  and  com- 
pleted the  accompanying  technical  documentation  with  the  assist- 
ance of  the  Office  of  Revenue  Sharing,  we  will  begin  copying  and 
disseminating  this  file. 


At  the  request  of  the  Office  of  Revenue  Sharing,  the  Data  User 
Services  Division  of  the  Bureau  of  the  Census  is  preparing  and 
distributing  copies  of  the  special  ORS  computer  tape  file  contain- 
ing the  basic  data  elements,  the  allocation  amounts,  and  the  gov- 
ernmental unit  identification  codes.  At  the  moment,  the  data  ele- 
ments file,  which  is  available  on  a  single  reel  of  computer  tape, 
covers  the  first  three  entitlement  periods.  The  tape  file  for  Entitle- 
ment Period  4  will  probably  be  available  later  this  year  after  ORS 
has  transmitted  a  working  tape  to  the  Bureau  which  will  then  be 
converted  to  suitable  format  for  industry-compatible  equipment. 

The  Office  of  Revenue  Sharing  has  also  asked  the  Census  Bureau 
to  assist  them  by  distributing  user  tape  copies  of  the  actual  use  and 
planned  use  report  information  supplied  by  the  local  governments 
participating  in  the  revenue  sharing  program.  At  this  writing,  the 
Census  Bureau  technical  staff  is  examining  a  working  tape  file 


CENSUS  OF  GOVERNMENTS  SUMMARY  TAPES 

Employment  and  finance  data  will  also  be  presented  on  user 
tape  for  each  of  the  approximately  78,000  local  governments, 
including  school  districts  and  special  districts,  enumerated  in  the 
1972  Census  of  Governments  and  for  each  of  the  State  govern- 
ments. Data  summaries  will  also  be  made  available  for  States  and 
counties  by  function  or  type  of  government.  These  tape  files  will 
probably  become  available  in  early  September  together  with  the 
appropriate  technical  documentation. 

More  information  about  the  general  revenue  sharing  tapes  as 
well  as  the  Census  of  Governments  tapes  can  be  obtained  from  the 
Users  Service  Staff,  Data  User  Services  Division,  Bureau  of  the 
Census,  Washington,  D.C.  20233. 
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General  Revenue  Sharing 

as  Data  User  and  Data  Producer 

Arthur  L  Hauser 

Office  of  Revenue  Sharing 

INTRODUCTION 

While  politicians  discuss  the  directly  observable  socioeconomic 
impacts  of  the  general  revenue  sharing  program,  those  of  us  who 
are  responsible  for  sharing  the  revenues  are  refining  the  procedures 
we  used  to  "make  it  all  happen." 

Our  program  has  already  had  an  extraordinary  impact  on  the 
Census  Bureau's  ongoing  efforts  to  collect  revenue  data  from  gen- 
eral purpose  units  of  government.  In  the  past  year  and  a  half,  it  has 
become  clear  to  all  States  and  general  purpose  units  of  local  gov- 
ernment that  the  data  they  are  requested  to  provide  to  the  Census 
Bureau  on  general  tax  effort  and  intergovernmental  transfers  of 
funds  are  used  by  the  Treasury  Department  to  determine  alloca- 
tions of  shared  revenues. 

Data  supplied  to  Treasury's  Office  of  Revenue  Sharing  by  the 
Bureau  of  the  Census  are  applied  to  formulas  set  forth  in  Title  I  of 
the  State  and  Local  Fiscal  Assistance  Act  of  1972  in  order  to 
determine  the  allocation  of  revenue  sharing  money  to  each  of  more 
than  38,000  units  of  State  and  local  government. 

Now  that  the  governments  involved  are  aware  that  their  revenue 
sharing  dollars  depend  on  it,  their  responses  to  the  Census  Bureau's 
requests  for  information  have  improved  both  in  quantity  and  in 
quality.  This  not  only  makes  distribution  of  funds  by  the  Office  of 
Revenue  Sharing  more  accurate,  but  it  also  cannot  help  but  im- 
prove generally  the  many  other  research  and  planning  programs 
that  look  to  the  Census  Bureau  for  data  with  which  to  work. 

THE  ORS  DATA  IMPROVEMENT  PROGRAM 

In  our  efforts  further  to  improve  data  supplied  by  the  Census 
Bureau,  the  Office  of  Revenue  Sharing  regularly  mails  to  each  unit 
of  government  the  data  that  have  been  provided  for  that  unit, 
together  with  a  form  on  which  to  submit  proposed  corrections  for 
any  data  element  which  the  government  believes  to  be  in  error. 
Documentation  to  support  the  proposed  changes  is  also  required. 
This  administrative  procedure  to  identify  and  correct  data  errors  is 
known  as  the  ORS  Data  Improvement  Program.  Replies  from  State 
and  local  governments  showing  cause  for  data  revision  are  reviewed 
with  the  Bureau  of  the  Census. 

Three  data  improvement  programs  have  been  conducted  by  the 
Office  of  Revenue  Sharing  since  the  inception  of  the  general  reve- 
nue sharing  program. 

In  the  first,  data  used  to  calculate  allocations  of  funds  through 
June  30,  1973,  were  provided  to  recipient  governments  in  Decem- 
ber 1972.  Of  the  4,000  governments  that  questioned  one  or  more 
of  the  data  elements,  approximately  half  were  found  to  have  had 
adequate  reason  for  data  revisions.  Additional  changes  were  made 
as  a  result  of  the  Census  Bureau's  data  improvement  efforts. 

The  second  data  improvement  effort  was  initiated  in  October 
1 073,  when  figures  to  be  used  in  making  fiscal  year  1974  alloca- 


tions were  returned  to  all  units  of  government  for  review.  Only 
about  2,000  governments  challenged  data,  and  only  850  such 
requests  for  change  were  found  to  have  merit.  Almost  10,000  data 
revisions  were  made,  however,  as  a  result  of  the  routine  data 
improvement  efforts  of  the  Census  Bureau  and  the  Office  of  Reve- 
nue Sharing. 

Data  to  be  used  in  calculating  fiscal  year  1975  (Entitlement 
Period  5)  amounts  were  provided  to  recipient  governments  in 
February  1974,  and  about  1,600  governments  responded  with 
proposed  changes.  Data  were  revised  for  approximately  750  gov- 
ernments as  a  result  of  this  effort.  Almost  all  of  the  proposed 
improvements  had  been  considered  and  acted  upon  before  initial 
allocations  of  fiscal  year  1975  amounts  were  made.  Accordingly, 
data  used  in  the  calculations  were  of  very  high  quality.  Completion 
of  the  data  improvement  program  before  the  initial  allocation  for  a 
given  period  is  an  important  step  toward  minimizing  the  need  for 
future  adjustments. 

Since  allocations  of  all  funds  are  dependent  on  hundreds  of 
thousands  of  bits  of  data  relating  to  each  general  purpose  unit  of 
government  in  the  United  States,  it  is  essential  that  these  data  be 
accurate.  The  Office  of  Revenue  Sharing  has  launched  a  major 
study  to  evaluate  current  and  alternative  data  sources  and  strategies 
for  the  purpose  of  minimizing  data-based  inequities.  Its  objectives 
are  as  follows: 

1.  To  determine  the  relative  effects  on  equity  of  revenue 
sharing  allocations  of  varying  degrees  of  currency,  comprehen- 
siveness, and  accuracy  of  each  of  the  data  elements  used  in  the 
allocation  formulas. 

2.  To  determine  the  degree  of  inequity  that  would  result  in 
each  of  the  next  5  years  if  present  data  sources  were  to  be  used, 
and  the  impact  on  States  and  local  jurisdictions  with  signifi- 
cantly differing  characteristics. 

3.  To  identify  alternative  sources  of  data  for  each  of  those 
data  elements  that,  if  present  sources  were  to  be  used,  would 
result  in  significant  inequity  of  allocations. 

4.  To  prepare  and  document  a  set  of  alternative  data  strate- 
gies and  to  make  recommendations  as  to  which  strategy  should 
be  followed. 

The  study  is  being  conducted  by  the  Stanford  Research  Insti- 
tute and  is  due  to  be  completed  in  September  1974. 

PLANNED  AND  ACTUAL  USE  REPORTS 
ON  SHARED  REVENUE 

In  addition  to  our  joint  efforts  with  the  Census  Bureau  to 
improve  data  used  in  allocating  funds,  the  Office  of  Revenue 
Sharing  is  generating  information  regarding  individual  governments' 
plans  for  and  actual  uses  of  shared  revenues. 

In  May  of  each  year,  a  form  is  sent  to  each  eligible  government 
on  which  that  government  must  report  its  plans  for  uses  of  money 
it  will  receive  for  the  following  entitlement  period  (equivalent,  at 
present,  to  the  following  Federal  fiscal  year).  When  completed, 
these  Planned  Use  Report  forms  must  be  published  locally  in  news- 
papers of  general  circulation  and  returned  to  the  Office  of  Revenue 
Sharing  by  a  specified  date.  (In  the  case  of  the  forms  that  were 
distributed  in  May,  the  due  date  was  June  24,  1974.) 

At  the  end  of  each  entitlement  period,  an  Actual  Use  Report 
form  must  be  completed  by  each  unit  of  government  with  informa- 
tion about  the  amounts  of  money  that  government  has  used, 
appropriated,  and/or  obligated  in  the  same  general  categories  of 
expenditure. 
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Changes  may  be  made  in  a  government's  plans  for  uses  of  shared 
revenues  after  its  Planned  Use  Report  has  been  filed,  but  Actual 
Use  Report  data  reflect  firm  commitments  of  funds. 

The  Office  of  Revenue  Sharing  has  made  every  effort  to  make 
the  Planned  and  Actual  Use  Report  forms  simple  enough  to  be 
handled  by  the  volunteer  officials  of  very  small  villages,  yet  com- 
plete enough  to  provide  information  on  general  categories  of 
planned  expenditures,  as  required  by  law. 

It  is  essential  to  the  concept  and  administration  of  the  general 
revenue  sharing  program  that  these  one-page  forms  be  kept  as 
simple  as  possible,  for  Congress  and  the  Administration  have 
always  agreed  that  revenue  sharing  money  should  be  virtually 
"strings  free"  and  uncomplicated  both  in  delivery  and  in  use. 

In  addition,  since  the  reports  must  be  published  by  all  recipient 
governments,  we  are  anxious  to  minimize  publication  costs  by 
keeping  the  size  of  the  form  small. 

The  fact  that  the  Office  of  Revenue  Sharing  requires  only  very 
generalized  information  relating  to  categories  of  expenditure 
capital,  as  opposed  to  operating  and  maintenance  expenditures  and 
effects  of  tax  levels,  poses  something  of  a  dilemma  among  those 
who  have  become  accustomed  to  the  very  detailed  monitoring  by 
Federal  agencies  of  categorical  aid  grants. 

The  criticism  has  been  made  of  our  program  that  the  data  we 
collect  do  not  provide  information  on  uses  of  shared  revenues 
down  to  the  project  level.  Accordingly,  our  critics  say,  it  is  impossi- 
ble to  measure  the  impacts  of  the  general  revenue  sharing  program 
using  data  collected  by  the  Office  of  Revenue  Sharing. 

To  be  sure,  we  do  not  require  detailed  information,  for  reasons  I 
have  already  cited.  But  given  the  size  of  the  universe  involved  in 
our  data-collection  efforts,  and  given  the  longstanding  intent  of 
Congress  that  ours  be  a  generalized  form  of  Federal  financial  assist- 
ance, then  we  feel  that  our  data-collection  efforts  are  entirely 
appropriate  to  this  program  at  this  time. 

The  data  we  collect  on  plans  for  and  actual  uses  of  the  funds  we 
distribute  come  to  us  as  amounts  of  dollars  to  be  spent  in  such 


areas  of  activity  as  public  safety,  environmental  protection,  public 
transportation,  health,  recreation,  libraries,  social  services  for  the 
aged  or  poor,  financial  administration,  multipurpose  and  general 
purpose  governments,  education,  social  development,  housing  and 
community  development,  economic  development  and  "other." 
Recipient  governments  are  asked  to  indicate  how  much  money,  in 
numbers  of  dollars,  they  have  devoted  to  capital  projects  and  how 
much  to  operating  and  maintenance  for  each  category  that  applies. 

Multiple  choice  questions  are  also  asked  about  what  methods 
recipients  use  for  audit  and  about  any  effects  shared  revenues  may 
be  having  on  their  local  tax  levels. 

As  forms  are  returned,  they  are  edited  by  ORS  staff  who  deter- 
mine whether  all  the  required  information  has  been  provided. 
Completed  forms  are  bundled  and  sent  to  an  Internal  Revenue 
Service  data-processing  center  where  data  tapes  are  prepared.  The 
tapes  are  then  used  for  administrative  purposes  and  for  analysis.  A 
tape  which  contains  the  Planned  Use  Reports  for  Entitlement 
Periods  3  and  4  and  a  tape  containing  the  Actual  Use  Report  data 
as  of  June  30,  1973,  are  available  for  sale  by  the  Users  Service 
Staff,  Data  User  Services  Division,  Bureau  of  the  Census,  Washing- 
ton, D.C.  20233. 

The  aggregated  data  are  available  for  inspection  and  study  in  the 
Office  of  Revenue  Sharing,  however.  A  voluminous  file  of  print- 
outs includes  the  aggregated  data  by  State,  by  type  of  government, 
by  size  and  type  of  government,  and  by  category  of  expenditure. 

We  have  published  two  reports  containing  analysis  of  the  data 
from  the  Planned  and  Actual  Use  Reports.  The  first  of  these,  based 
on  Entitlement  Period  3  Planned  Use  Reports,  was  released  in 
September  1973.  The  other,  dated  March  1974,  is  largely  con- 
cerned with  Actual  Use  Reports  in  which  recipient  governments 
reported  their  obligations  and  expenditures  to  June  30,  1973,  of 
shared  revenues  received  by  them  through  that  date. 

We  invite  your  comments  and  suggestions  regarding  the  data  we 
use  and  the  data  we  produce. 
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Comments  on  Public  Data  Files 
for  Socioeconomic  Research 

Trudi  Lucas,  National  Science  Foundation 
Lawrence  L.  Brown,  Hi,  Data  Use  and  Access 
Laboratories,  Arlington,  Va. 


INTRODUCTION 

The  Research  Applications  Directorate  (RANN)  of  the  National 
Science  Foundation  (NSF)  has  a  mission  to  fund  research  which 
will  be  of  use  to  policymakers  at  the  Federal,  State,  and  local 
levels.  As  part  of  RANN's  activities,  it  is  currently  sponsoring  a 
sizable  research  program  on  general  revenue  sharing.  This  program 
has  lead  to  the  early  release  of  public  data  relevant  to  studies  of 
local  government,  as  well  as  of  revenue  sharing. 

The  first  activity  RANN  supported  in  the  area  of  research  on 
general  revenue  sharing  was  a  conference  of  academics  and  policy- 
makers in  December  1973.  At  that  conference  it  became  apparent 
that  the  greatest  research  need  was  access  to  data  and  program 
tapes.  Representatives  from  the  Office  of  Revenue  Sharing 
expressed  their  willingness  to  make  the  information  public.  How- 
ever, with  a  staff  of  well  under  100,  ORS  found  itself  overwhelmed 
by  requests  for  complex  information  from  large  numbers  of  eager 
researchers. 

In  the  month  following  the  conference,  RANN  and  ORS  agreed, 
in  essence,  that  if  NSF  would  bear  the  cost  of  making  information 
available  to  researchers,  ORS  would  assist  in  the  provision  of  rele- 
vant programs  and  data  tapes.  Two  awards  implemented  this  agree- 
ment. One  award  went  to  Westat,  Inc.,  which  is  working  with  ORS 
to  provide  a  Fortran  version  of  the  program  ORS  uses  to  determine 
allocation  to  governments,  pursuant  to  Sections  107-109  of  the 
State  and  Local  Fiscal  Assistance  Act  of  1972.  (For  information 
contact  Mr.  Thomas  Jones,  Westat,  Inc.,  1160  Nebel  Street,  Rock- 
ville,  Md.  20852.  Telephone  301-881-5310.) 

The  second  award  went  to  Data  Use  and  Access  Laboratories 
(DUALabs)  for  the  preparation  of  data  tapes  to  be  used  in  describ- 
ing distributions  made  under  the  present  formulas  and  in  designing 
alternatives  to  the  existing  formulas.  To  execute  these  analyses 
three  types  of  data  are  essential:  Data  on  the  allocations  made  for 
over  38,000  general  purpose  governments;  socioeconomic  data 
drawn  from  the  1970  Census  of  Population  and  Housing;  and  data 
on  tax,  revenue,  and  structure  drawn  from  the  1972  Census  of 
Governments.  Other  relevant  data  tapes  are  also  covered  under  the 
award.  The  time-consuming  task  involved  in  preparing  these  tapes 
requires  matching  1970  enumeration  district,  block  group,  and 
place  codes  with  government  jurisdiction  codes  and  ORS  account 
numbers  (virtually  the  same  as  census  of  governments  codes). 
Another  problem  is  access  to  the  basic  data  tapes,  though  both  the 
Office  of  Revenue  Sharing  and  the  Census  Bureau  are  cooperating 
in  helping  DUALabs  acquire  the  necessary  files. 

As  a  result  of  this  program,  several  public  data  files  are  now 
available  for  research  on  revenue  sharing.  A  brief  description  of 
these  files  follows.  (For  further  information,  contact  Mr.  Robert 
Gignilliat,  DUALabs,  1601  North  Kent  Street,  Suite  900, 
Arlington,  Vd.  22209.; 


REVENUE  SHARING  DATA  FILE 
SUMMARY  DESCRIPTIONS 

1.  Fourth  Count  Census/ORS  Data  Elements  "Maxi"  File 

This  data  set  was  created  by  matching  selected  items  from  the 
1970  Census  Fourth  Count  Population  and  Housing  (Files  B  and  C) 
summary  tapes  and  data  from  the  Office  of  Revenue  Sharing  Data 
Elements  tape  for  Entitlement  Period  1 . 

Over  2,500  socioeconomic  items  are  furnished  for  each  jurisdic- 
tion including  distributions  for  age,  sex,  household  relationship, 
income,  poverty  status,  education,  occupation,  industry,  and  hous- 
ing value  and  rent.  ORS  variables  include  government  name,  1970 
population  (sample  and  complete  count),  allocation  amounts,  per 
capita  income,  adjusted  taxes,  and  intergovernmental  transfers  for 
Entitlement  Period  1  only.  Socioeconomic  variabiles  are  generally 
shown  for  six  race-by-residence  universe  combinations.  That  is,  for 
any  one  government,  data  are  summarized  for  black,  white, 
Spanish  American  and  all  races  by  total  and  by  urban  residence 
categories.  (Rural  totals  can  be  obtained  by  subtraction.) 

The  "maxi"  file  contains  summaries  for  all  ORS-recognized 
State  and  county  governments  and  for  most  township  governments 
and  municipalities  with  1970  populations  of  2,500  or  more.  Places 
with  less  than  2,500  inhabitants,  special  areas,  Indian  reservations, 
and  certain  specially  defined  larger  areas  in  Minnesota,  North 
Dakota,  and  several  other  States  are  are  not  present  on  this  file. 
For  those  governments  which  are  present,  both  decennial  census 
and  government  census  geographic  codes  are  furnished. 

2.  Fourth  Count  Census/ORS  Data  Elements 
"Mini"  File 

This  file  was  created  by  extracting  selected  data  items  from  the 
Fourth  Count/ORS  "maxi"  File. 

A  "maxi"  file  census  data  subset  of  approximately  200  socio- 
economic data  items  is  furnished  for  each  jurisdiction.  Most  major 
"maxi"  file  variables  are  accounted  for;  however,  they  are  provided 
in  less  detail,  in  most  instances  for  the  total  population  universe 
only.  All  ORS  data  items  have  been  carried  over  from  the  "maxi" 
file  and  are  present  in  this  data  set.  Geographic  detail  is  the  same  as 
for  the  "maxi"  file. 

3.  Special  Fifth  Count  Census/ORS  Data  Elements 
Summary  File 

The  demographic  information  in  this  file  was  extracted  from  a 
special  1970  Census  of  Population  and  Housing  tabulation  of 
sample  data  for  enumeration  districts  (ED's)  and  block  groups 
(BG's).  Selected  tables  from  this  file  were  extracted,  sorted  and 
reaggregated  to  create  State,  county,  township,  and  municipality 
groupings.  Where  necessary,  a  special  clerical  operation  was  under- 
taken and  governments  were  defined  in  terms  of  ED's  and  BG's  or 
portions  of  these.  Use  of  this  file  permitted  the  creation  of  socio- 
economic data  summaries  for  most  small  municipalities  and  special 
areas  not  present  in  the  1970  Census  Fourth  Count  "maxi"  or 
"mini"  files.  It  should  be  noted,  however,  that  certain  Indian 
reservations,  tribal  councils,  and  Alaskan  villages  are  nongeographic 
entities.  Demographic  information  for  these  special  areas  was 
unobtainable;  therefore  records  for  such  places  contain  ORS  infor- 
mation only. 

Once  the  Fifth  Count  file  was  created  it  was  matched  with  the 
ORS  Data  Elements  file  and  a  combined  data  set  for  most  ORS- 
recognized  governments  was  created. 
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Approximately  600  socioeconomic  items  (about  30  tables)  are 
provided  for  each  jurisdiction.  Information  is  considerably  more 
detailed  than  that  provided  by  the  "mini"  file  but  less  extensive 
than  that  contained  in  the  "maxi"  file.  Many  items  are  shown  for 
white,  black,  Spanish  American,  and  total  race  groups;  however, 
information  on  residence  is  restricted  to  a  simple  count  of  urban, 
rural  farm,  and  rural  nonfarm  persons.  All  ORS  data  elements  con- 
tained in  the  "maxi"  and  "mini"  files  are  present  in  the  Fifth 
Count  file. 

The  file  contains  summaries  for  all  State,  county,  township,  and 
municipal  governments.  All  special  areas  are  present  as  well. 

4.  DUALabs  Reformatted  1972  County  and  City 

Data  Book  File 

The  1972  County  and  City  Data  Book  (CCDB)  was  prepared  by 
the  Bureau  of  the  Census  in  order  to  present  a  variety  of  summary 
statistical  information  about  counties,  standard  metropolitan  sta- 
tistical areas  (SMSA's),  cities,  urbanized  areas,  unincorporated 
places,  States,  and  the  United  States. 

The  DUALabs  1972  County  and  City  Data  Book  file  was 
created  as  a  supplement  to  the  CCDB.  All  information  contained  in 
the  DUALabs  file  is  also  in  the  book,  with  the  exception  of  infor- 
mation contained  in  the  appendixes  and  tables  1  and  1A  of  the 
book.  Also,  the  CCDB  contains  descriptive  texts,  source  notes,  and 
footnotes  which  are  essential  for  reliable  interpretation  of  the  data. 

The  DUALabs  file  has  been  created  from  the  Census  Bureau's 
file.  The  records  in  the  DUALabs  file  have  been  restructured  so 
that  the  256  flag  fields  are  located  at  the  end  of  each  logical  record 
rather  than  preceding  the  256  individual  data  fields.  Also,  the 
tables  have  been  restructured  so  that  in  the  file  there  are  20  tables, 
10  composed  of  data  fields  and  10  composed  of  flag  fields. 

5.  Office  of  Revenue  Sharing  Master  File 

These  files  contain  the  ORS  Revenue  Sharing  Data  Elements  for 
Entitlement  Periods  1 ,  2,  and  3. 

In  addition  to  the  allocation  amount  for  each  entitlement 
period,  records  include  1970  census  population  counts  (sample  and 
complete  count),  per  capita  income,  adjusted  taxes,  and  inter- 
governmental transfers  for  each  recipient  government. 

Records  are  available  for  each  jurisdiction  participating  in  the 
general  revenue  sharing  program. 

6.  General  Revenue  Sharing  Survey  Files 

Adjusted  tax  and  intergovernmental  transfer  data  are  collected 
annually  by  the  Census  Bureau  from  all  recipient  governments 
that  have  independent  school  districts  via  forms  RS-8  or  RS-9. 
When  the  Census  Bureau  compiles  the  data  from  official  sources 
(e.g.,  as  a  government's  audit  or  financial  report  or  from  a  State 
report),  it  completes  form  RS-8  and  sends  the  completed  form  to 
the  recipient  for  verification.  When  the  Census  Bureau  seeks  the 
original  data  from  the  recipient  it  will  send  form  RS-9  for  the 
recipient  to  complete.  For  recipients  with  dependent  school 
districts,  the  information  is  obtained  via  form  RS-12.  RS-8,  RS-9, 
and  RS-12  data  for  fiscal  year  1973  (July  1,  1972,  through  June 
30,  1973)  are  contained  on  computer  tape. 

Records  contain  each  jurisdiction's  population  and  detailed 
adjusted  tax  revenue  and  intergovernmental  transfer  payments. 
Information  includes  the  amounts  received  from  property  taxes, 
general  sales  taxes,  gas  taxes,  liquor  taxes,  utility  taxes,  selected 
sales  taxes,  income  taxes,  and  amounts  received  from  various  other 
tax  revenue  categories.  Federal,  State,  and  local  intergovernmental 


payments  for  health,  education,  welfare,  road  maintenance,  and 
other  services  are  also  included.  In  all,  dollar  amounts  for  36 
different  tax  and  transfer  payment  categories  are  represented. 

Data  cover  all  recipient  governments  for  which  a  fiscal  year 
1973  RS-8,  RS-9,  or  RS-1 2  form  was  completed. 

7.  "Mini"  File/General  Revenue  Sharing  Survey 

Matched  Tape 

This  data  set  will  be  created  by  matching  records  from  the 
Fourth  Count  "Mini"  File  with  data  obtained  from  the  General 
Revenue  Sharing  Survey  files.  Records  will  contain  the  full  comple- 
ment of  "Mini"  and  General  Revenue  Sharing  File  data.  Geo- 
graphic detail  will  be  like  the  "maxi"  and  "mini"  files. 

8.  Special  Fifth  Count  Census/General  Revenue  Sharing 
Survey  Matched  File 

This  file  will  be  created  by  matching  the  Special  Fifth  Count 
Census/ORS  and  the  General  Revenue  Sharing  Survey  Files.  Data 
content  will  be  as  described  for  the  Special  Fifth  Count  and 
General  Revenue  Sharing  Survey  data  bases.  The  file  will  contain 
records  for  all  recipient  governments  included  in  the  Special  Fifth 
Count  Census/ORS  tapes. 

9.  1972  Census  of  Governments  Finance  Data 

The  file  contains  detailed  information  about  the  finances  of 
general  purpose  governments,  special  purpose  districts,  school 
districts,  and  other  areas.  Data  are  collected  and  compiled  quin- 
quennially  for  the  census  of  governments.  Information  contained 
in  these  files  is  essentially  the  same  as  that  which  is  released  for  the 
1972  Census  of  Governments  Vol.  4  publications  on  government 
finances. 

Detailed  statistics  are  provided  on  finances  for  such  major  items 
as  revenue  by  source,  expenditures  by  character  and  object  and  by 
function,  indebtedness  and  debt  transactions,  cash  and  security 
holdings,  and  finances  for  government-operated  utilities. 

Data  are  tabulated  for  States,  counties,  independent  school  dis- 
tricts, selected  special  districts,  municipalities  and  townships  with 
populations  of  1 0,000  or  more,  cities,  the  United  States,  and  regions. 


10.  1972  Census  of  Governments  Employment  Data 

This  file  contains  information  comparable  to  that  released  in  the 
1972  Census  of  Governments  Vol.  3  series  on  public  employment. 
Detailed  technical  information  is  not  yet  available. 

Comprehensive  statistics  will  be  provided  on  governmental 
employment  and  payroll  by  level  of  government  and  function  for 
the  United  States,  regions,  States,  counties,  municipalities  and 
selected  townships  of  10,000  or  more  inhabitants,  school  districts 
with  3,000  or  more  pupils,  and  special  districts  with  100  or  more 
full-time  employees. 

11.  1 972  Census  of  Governments  Name  and  Address  File 

This  file  contains  the  name,  mailing  address,  and  census  of 
governments  unit  identification  code  for  every  governmental  unit 
in  the  1972  Census  of  Governments. 

12.  Planned  and  Actual  Use  Reports 

The  data  in  these  files  contain  information  extracted  from 
responses  to  the  Office  of  Revenue  Sharing's  Planned  Use  Report 
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Forms  for  Entitlement  Periods  3  and  4  and  the  Actual  Use  Report  monies  by  expenditure  category.  Information  is  available  for  each 

Forms  for  the  period  between  January  1972  and  June  1973.  recipient  government  including  Indian  tribes  and  Alaskan  native 

Statistics  relate  to  planned  and  actual  use  of  revenue  sharing         villages. 
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The  Continuous  Work  History  Sample 

David  A.  Hirschberg 

Social  Security  Administration 


and  the  Master  Beneficiary  Record,  a  comprehensive  record  for  all 
benefits  in  force. 

Six  files— all  of  which  contain  data  for  sex,  race,  and  age— are 
available  to  outside  users.  So  that  the  files  may  be  utilized  for 
longitudinal  studies  and  the  confidentiality  of  individuals  and  firms 
maintained,  the  employee  and  employer  identification  is  included 
in  scrambled  form. 


INTRODUCTION 

In  administering  the  Social  Security  system,  an  unprecedented 
volume  of  economic  and  demographic  data  are  generated.  Much  of 
this  basic  information  is  summarized  in  the  monthly  Social  Securi- 
ty Bulletin  and  its  Annual  Statistical  Supplement,  as  well  as  in 
special  releases  and  reports. 

Because  the  basic  administrative  records  have  great  value  for 
economic  and  social  research,  it  is  the  general  policy  of  the  Social 
Security  Administration  (SSA)  to  make  its  data  resources  available 
for  this  purpose.  Such  research  must  meet  two  conditions  of  nec- 
essity: it  must  provide  for  safeguarding  the  confidentiality  of 
information  for  individuals,  firms,  and  reporting  units;  and  it  must 
be  feasible  without  impairing  the  administration  of  the  Social 
Security  program. 

Our  publication,  "Some  Statistical  Research  Resources  Avail- 
able at  the  Social  Security  Administration,"  provides  a  detailed 
description  of  the  available  data  files,  the  procedures  used  in  their 
compilation,  and  how  the  files  can  be  obtained.  The  overall  re- 
search program  of  the  Social  Security  Administration  is  detailed  in 
the  annual  Work  Plan  of  the  Office  of  Research  and  Statistics. 


AVAILABILITY  OF  RESEARCH  FILES 

Many  of  the  analytical  data  drawn  from  our  administrative 
records  are  most  conveniently  handled  by  the  use  of  samples.  To 
provide  outside  users  with  a  general  research  file  at  modest  cost, 
the  SSA  makes  available  an  annual  1 -percent  continuous  work  his- 
tory sample.  The  nature  of  this  data  file  may  best  be  understood 
by  seeing  how  the  various  sources  of  data  come  together. 

When  a  person  applies  for  a  Social  Security  number  (SSN),  he 
provides  data  on  his  sex,  race,  and  date  of  birth,  enabling  us  to 
maintain  a  file  of  individuals  by  SSN,  sex,  race,  and  age.  When  an 
employer  requests  an  employer  identification  number  (EIN),  he 
provides  data  on  his  geographic  location  and  industry  activity, 
enabling  us  to  maintain  a  file  of  employers  by  EIN  coded  to  State, 
county,  and  industry  (4-digit  Standard  Industrial  Classification  or 
SIC  in  manufacturing  and  3-digit  SIC  in  nonmanufacturing).  Each 
quarter,  covered  employers  report  the  wages  of  their  employees  up 
to  the  taxable  limit.  (Agriculture  and  self-employment  data  are 
reported  annually.)  By  matching  the  earnings  data  first  to  the  SSN 
file  and  then  to  the  EIN  file,  we  obtain  for  each  job,  by  quarter, 
data  on  sex,  race,  age,  State,  county,  industry,  and  wages. 

Most  importantly,  the  selection  of  a  fixed  sampling  pattern  of 
Social  Security  numbers  for  inclusion  each  year  permits  the  estab- 
lishment of  a  work-history  file  for  tracing  employment,  migration, 
and  earnings  status  of  all  those  who  worked  in  covered  industries 
and  for  determining  their  socioeconomic  characteristics  from  1957 
to  date.  As  new  workers  enter  the  work  force,  those  with  specified 
digits  enter  the  annual  sample;  as  others  drop  out  of  the  covered 
work  force,  those  with  specified  digits  no  longer  appear. 

More  recently,  claims  data  have  been  introduced  into  the  file. 
These,  of  course,  are  obtained  from  the  application  for  benefits 


1.  1 -percent  annual  employee-employer  file.  This  includes 
wage  and  salary  employment  reported  in  the  reference  year, 
with  one  record  for  each  employee-employer  combination. 
Basic  data  elements  include  annual  and  quarterly  taxable  wages, 
total  estimated  wages,  State  and  county,  industry  and  coverage 
group  (for  example-farm,  military,  and  household).  This  file 
becomes  available  approximately  2  years  following  the  year  of 
reference.  Currently  the  file  is  available  for  each  year  from  1957 
to  1 971 . 

2.  I -percent  first-quarter  employee-employer  file.  This  file 
contains  the  same  data  elements  as  the  annual  file,  except  that  it 
becomes  available  about  15  months  after  the  quarter  of  refer- 
ence. Because  an  effort  is  made  to  obtain  the  file  as  quickly  as 
possible,  late  reports  are  excluded  and  coding  problems  which 
may  exist  are  not  resolved.  Excluded  are  agriculture  and  self- 
employment  data,  which  are  reported  on  an  annual  basis. 

3.  1 -percent  longitudinal  employee-employer  data  (LEED) 
file.  The  basic  data  elements  are  the  same  as  in  the  annual  file, 
except  that  the  records  are  skeletonized,  are  currently  available 
from  1957  to  1970,  and  are  sequenced  so  that  all  records  associ- 
ated with  an  employee  appear  together. 

4.  1-percent  annual  self-employed  file.  This  file  includes  the 
same  basic  data  elements  as  the  employer-employee  files,  but  it 
covers  net  and  taxable  earnings  for  those  who  are  self-employed. 
The  basic  source  is  the  Internal  Revenue  Services  (IRS)  Sched- 
ule SE.   The  earliest  data  available  are  from  1960. 

5.  1-percent  1937-to-date  continuous  work  history  sample 
(CWHS).  This  file  provides  various  data  indications  from  1937 
including  years  employed,  first  and  last  years  employed,  pattern 
of  quarters  employed  for  the  last  2  years,  number  and  quarters 
of  coverage  beginning  in  1937,  patterns  of  coverage  beginning  in 
1 957,  farm  or  nonfarm  wage  or  self-employment  indicators,  tax- 
able and  self-employed  earnings  each  year  beginning  in  1951, 
and  insurance  status  and  benefit  information. 

6.  One-tenth  percent  1937-to-date  CWHS.  This  file  provides 
the  same  data  as  the  1 -percent  CWHS,  but  it  includes  a  greater 
level  of  detailed  earnings  information  beginning  in  1937.  There 
is  no  geographic  or  industrial  detail. 

LIMITATION  OF  CWHS 

When  administrative  data  are  used  for  analytical  purposes,  it  is 
necessary  for  the  researcher  to  be  aware  of  some  problems  and 
limitations.  These  occur  because  the  entire  labor  force  is  not 
covered  and  the  employer  reports  only  wages  up  to  the  taxable 
maximum.  Moreover,  there  are  problems  of  timing,  improving  the 
geographic  and  industry  coding,  and  sampling  and  nonsampling 
errors  in  utilizing  the  data. 

Coverage 

No  major  changes  in  the  coverage  provisions  of  the  Social  Securi- 
ty system  have  taken  place  since  1954.  Currently,  the  sample 
covers  well  over  90  percent  of  workers  in  paid  employment.  Two 
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types  of  Old  Age,  Survivors,  Disability  and  Health  Insurance 
(OASDHI)  coverage  exist:  mandatory  and  elective.  On  a  manda- 
tory basis  are  all  employees  in  nonfarm  industries  (except  railroad 
workers),  most  farm  and  domestic  employees  who  meet  minimum 
earnings  provisions,  and  Federal  employees  not  covered  by  the 
Federal  retirement  system.  Groups  covered  on  an  elective  basis  are 
ministers,  workers  in  nonprofit  establishments,  and  State  and  local 
government  workers.  All  self-employment  is  covered  if  earnings 
exceed  $400  per  year. 

Essentially  for  those  with  more  than  just  marginal  earnings,  the 
Social  Security  system  excludes  3  million  Federal  workers,  3 
million  of  the  10  million  State  and  local  government  workers,  and 
a  small  number  of  workers  in  nonprofit  organizations  since  most 
elect  coverage. 

Wages 

Several  problems  arise  when  administrative  wage  data  are  used 
for  analystical  purposes.  The  major  limitation  is  that  the  employer 
reports  wages  to  the  taxable  limit  for  each  worker.  The  taxable 
limit  has  risen  steadily  since  1957  when  it  was  $4,200;  currently  it 
is  $13,200.  Because  nonfarm  wage  data  are  reported  by  quarter,  an 
estimate  of  total  wages  is  possible.  This  procedure  estimates  total 
wages  by  substituting  the  last  full  quarter  wage  for  the  quarter  in 
which  the  taxable  limit  was  reached  and  for  each  subsequent  quarter. 

For  those  workers  who  reached  the  taxable  limit  in  the  first 
quarter,  separate  annual  estimates  for  males  and  females  are  pre- 
pared for  each  year  based  on  the  Pareto  method. 


Industry  and  Geographic  Coding 

As  mentioned  previously,  industry  and  geographic  coding  data 
are  obtained  when  a  firm  applies  for  an  EIN.  On  the  same  form  is 
the  question  asking  if  this  is  a  multiestablishment  firm.  If  the 
answer  is  yes,  the  firm  is  asked  to  participate  in  our  Establishment 
Reporting  Plan  covering  multiestablishment  firms.  Ideally,  we 
would  like  to  obtain  from  each  firm  an  individual  report  for  each 
establishment.  However,  reporting  by  establishment  is  voluntary, 
and  because  of  other  priorities,  only  a  limited  number  of  tech- 
nicians are  assigned  to  deal  with  establishment  reporting  problems. 
Simply  put,  the  problem  is  one  of  editing,  reviewing,  and  correct- 
ing, if  necessary,  several  million  firm  reports  received  each  quarter. 
We  are  planning  studies  to  determine  what  effect  the  reporting 
difficulty  has  and  how  its  impact  can  be  minimized.  The  file  does 
contain  internal  coding,  each  record  indicating  how  the  industry 
and  geographic  assignment  has  been  made.  This  is  a  great  help  in 
editing  the  file  so  that  spurious  changes  can  be  identified,  the 
nature  of  the  edits  depending  upon  the  research  undertaken. 

Although  researchers  using  the  file  should  be  aware  of  these 
limitations,  the  CWHS  is  still  a  powerful  analytical  tool.  It  permits 
extensive  disaggregation-by  sex,  race,  age,  industry,  geography, 
work  force  participation,  and  earnings  levels.  It  follows  the  same 
individual  workers  over  time  so  that  quarter-to-quarter  or  year-to- 
year  changes  for  specific  individuals  can  be  observed.  Most  im- 
portant, as  Nancy  and  Richard  Ruggles1  have  pointed  out,  it  can 
be  successfully  disaggregated  to  show  the  anatomy  of  the  total 
wage  bill  as  it  relates  to  the  national  economy. 

When  used  to  examine  migration,  the  file  provides  for  the 
development  of  area  data  on  gross  flows,  as  compared  with  net 

tcrence    on    Research    in    Income  and    Wealth,  "The   Anatomy   of 
F  -irninjjs  Behavior,"  National  Bureau  of  Economic  Research,  1974. 


flows,  because  the  net  may  be  an  average  that  masks  important 
characteristics  of  two  very  different  gross  flows. 

Cross-sectional  data  on  earnings  and  mobility  status  can  at  best 
be  only  partly  informative;  generally  they  are  misleading.  For 
example,  workers  who  move  from  one  area  to  another  (migrants) 
earn  less  than  nonmigrants.  However,  migrants  increase  their  earn- 
ings at  a  faster  rate  but  start  from  a  lower  base.  If  we  examine 
census  data,  the  nonmigrants  appear  to  be  the  higher  paid  group, 
and  some  economists  have  suggested,  therefore,  that  mobility  does 
not  improve  the  economic  status  of  migrants. 

MEETING  OF  CWHS  USERS 

A  CWHS  Users  Conference  is  held  each  year.  Invitations  are 
extended  to  anyone  interested  in  the  problems  of  handling  the  data 
files,  to  those  who  wish  to  report  on  recent  research  findings,  and 
to  those  involved  in  preparing  the  file  in  SSA.  These  meetings  over 
the  last  several  years  have  provided  users  of  the  data  with  a  better 
understanding  of  the  work  currently  underway  and  have  enabled 
the  producers  to  discuss  several  of  the  new  developments  relating 
to  the  file  and  methods  to  improve  it. 

DATA  DEVELOPMENT 
The  10-Percent  Sample 

The  1 -percent  CWHS  currently  available  has  important  limita- 
tions when  used  for  studying  small  standard  metropolitan  statis- 
tical areas  (SMSA's)  and  rural  areas.  The  initial  impetus  for  the 
development  of  a  10-percent  sample  file  came  from  the  Office  of 
Management  and  Budget  as  a  result  of  urgent  needs  for  intercensal 
population  estimates  for  revenue  sharing  and  other  programs,  and 
because  of  the  decision  not  to  conduct  a  mid-decade  census.  At 
present,  the  Department  of  Housing  and  Urban  Affairs  is  providing 
the  bulk  of  the  funding  with  other  Federal  agencies  making  signifi- 
cant contributions. 

The  data  base  for  this  file  will  be  similar  to  the  first-quarter  files 
described  earilier  and  will  include  those  working  in  the  first  quarter 
of  1971  and  1973.  In  the  United  States  during  the  first  quarter  of 
1971,  approximately  73  million  workers  held  80  million  jobs. 
Therefore,  the  1971  file  will  contain  records  for  7.3  million 
workers  and  8.0  million  jobs.  The  1973  file  will  be  approximately 
6-percent  larger. 

For  each  year,  the  records  will  be  summarized  so  that  the 
industry  and  place  of  work  information  is  available  for  the  major 
job.  The  file  will  be  merged,  indicating  for  each  individual  sex, 
race,  age,  if  his  employer  changed,  and,  for  1971  and  1973  geo- 
graphy, industry,  and  wages.  The  file  will  be  sorted  so  that  tabula- 
tions by  State  and  county  for  both  years  will  be  available  at 
modest  cost 

The  10-percent  sample  would  constitute  a  significant  asset  for 
regional  analysis.  No  other  source  of  data  could  provide  insight 
into  the  structure  of  a  local-area  labor  force,  so  that  employment 
distributions  by  sex,  race,  age,  wages  and  wage  changes,  work  force 
participation,  industry,  and  regional  migration  patterns  could  be 
analyzed  systematically. 

Occupational  Data 

In  addition,  a  detailed  proposal  has  been  prepared  to  test  the 
feasibility  of  adding  occupation  as  a  standard  data  item  to  the 
CWHS.  It  is  contingent  on  I RS  cooperation.  The  approach  is  to  use 
the  IRS  1040  occupation  information  supplemented  by  followups 
with  employees  and/or  employers  when  necessary.  The  proposal  is 
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under  serious  consideration  with  other  statistical  agencies  and  is 
planned  in  two  stages.  The  first  stage  will  be  a  feasibility  and 
quality  analysis  based  on  a  sample  of  15,000  CWHS  wage  earners. 
It  will  be  designed  to  examine  the  cost  and  operational  feasibility 
of  this  approach  and  to  explore  issues  of  quality— in  particular,  the 
extent  to  which  occupation  entries  on  the  IRS  1040  can  be  used 
for  classification  beyond  the  major  group  level. 

If  the  pilot  project  indicates  that  this  approach  to  the  collection 
of  occupational  data  appears  feasible  and  produces  data  of  accept- 
able quality,  the  project  would  go  into  a  second  phase,  expanded 
to  another  60,000  workers,  or  a  1  -in-1 ,000  sample. 


Place-of-Residence  Data 

In  another  project,  a  modified  version  of  the  Census  Bureau's 
address  reference  file  is  being  used  to  automatically  assign  geo- 
graphic codes  to  the  1972  CWHS.  This  operation  will  provide  for 
place-of-work  and  place-of-residence  comparisons,  also  facilitating 
the  editing  of  the  file. 


CONCLUSION 

Ideally  a  research  file  of  this  scope  should  contain  additional  in- 
formation. Occupation  and  place  of  residence  have  already  been 
mentioned.  Timeliness  of  the  data  has  been  improved  with  the 
availability  of  first-quarter  files.  In  terms  of  coverage,  efforts  will 
be  explored  to  provide  data  for  the  noncovered  portion  of  the 
CWHS.  There  is  a  need  to  incorporate  data  from  the  railroad  retire- 
ment system  and  the  Civil  Service  Commission.  In  addition, 
educational  attainment,  hours  of  work,  marital  status,  unemploy- 
ment, and  noncovered  wages  are  important  variables  to  consider. 

In  conclusion,  the  operation  of  the  Social  Security  system 
produces  a  vast  and  unique  body  of  longitudinal  data  on  earnings 
and  on  retirement  and  disability  claims  and  benefits  for  persons 
classified  by  age,  race,  and  sex.  It  has  been  our  policy  to  make  the 
data  available  to  social  scientists.  Over  the  past  year,  administrative 
and  research  agencies  of  government  have  been  extremely  helpful 
in  moving  some  of  these  research  efforts  forward,  and  we  are  grate- 
ful. In  undertaking  these  projects,  we  will  always  be  careful  to 
safeguard  and  protect  the  confidentiality  of  information  relating  to 
individuals. 
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Comments  on  Problems 

and  Opportunities 

with  Social  Security  Sample  Data 

Kathryn  P.  Nelson 

Oak  Ridge  National  Laboratory 


INTRODUCTION 

In  his  presentation,  David  Hirschberg  has  given  a  balanced  and 
comprehensive  summary  of  the  different  forms  in  which  sample 
data  are  available  from  the  Social  Security  insurance  program. 
After  reviewing  the  data's  source  as  an  administrative  byproduct, 
he  has  well  introduced  the  limitations  for  analytical  and  descriptive 
purposes  imposed  by  imperfections  in  coverage,  in  definitions  of 
wage  and  location,  and  in  timeliness.  Of  course,  actual  users  of  the 
data  must  consider  the  effects  of  such  imperfections  in  greater 
detail.  Although  he  did  not  elaborate  on  the  usefulness  of  the  file 
for  studying  the  composition  and  migration  of  the  labor  force,  I 
am  glad  he  emphasized  recent  progress  in  the  Social  Security 
Administration's  (SSA)  continuing  effort  to  improve  and  extend 
the  data.  News  of  the  availability  of  a  10-percent  sample  is  partic- 
ularly welcome,  since  such  an  enlarged  sample  will  greatly  increase 
the  geographic  specificity  possible  with  Social  Security  data  as  well 
as  the  capability  to  study  migration  and  labor  mobility  of  specific 
subpopulations  of  interest  within  areas  as  small  as  counties. 

With  his  paper  as  necessary  background,  I  will  consider  the  uses 
and  opportunities  presented  by  this  data  source,  particularly  what  I 
regard  as  the  data's  two  most  important  features— the  geographical 
and  longitudinal  detail  available.  I  will  mention  problems  with  the 
sample  data  as  I  discuss  data  improvements  in  some  detail  and,  as  a 
representative  of  the  "never-satisfied"  user  community,  urge  still 
more  improvements.  However,  I  wish  also  to  emphasize  the  ad- 
vantages, as  well  as  the  limitations,  that  stem  from  the  use  of 
administrative  data.  Social  Security  data  are  remarkably  cheap 
when  compared  to  survey  or  census  data,  well  maintained  and 
continuously  evaluated,  quite  consistently  defined,  and  notably 
free  from  nonresponse  and  recall  errors.  The  constant  concern  of 
the  SSA  for  protecting  the  confidentiality  of  the  data  is  also  an 
important  advantage.  For  such  reasons,  the  data  appear  a  natural 
candidate  for  improvements  to  achieve  maximum  usefulness  for 
research  purposes  at  minimum  cost. 

Important  substantive  advantages  of  Social  Security  data  stem 
from  the  degree  of  geographic  specificity  made  possible  by  the 
large  sample  size  and  from  the  longitudinal  coverage  of  the  same 
people  through  job,  wage,  and  geographic  mobility.  These  two  fea- 
tures are  the  most  valuable  in  that  they  are  available  together  to 
the  user.  I  will  discuss  both  their  optimal  research  uses  and  the 
priorities  for  data  development  suggested  by  these  opportunities. 
Some  points  will  be  illustrated  with  preliminary  results  from  my 
present  research,  which  seeks  to  determine  if  the  relative  under- 
coverage  of  young  adults  by  Social  Security  data  (because  of  their 
low  labor  force  participation  and  high  unemployment  rates) 
systematically  distorts  Social  Security  measures  of  gross  and  net 
migration.  (Sec  references  8  and  9.) 


GEOGRAPHIC  DETAIL 

Social  Security  data  provide  more  detail  on  geographic  location 
than  any  other  microdata  source,  even  the  Census  Public  Use 
Sample,  although  as  for  all  samples  one  must  be  concerned  with  the 
degree  of  statistical  reliability  accompanying  small  sample  size.  The 
location  of  each  employer  is  coded  to  State  and  county  (reference 
12)  providing  direct  information  about  the  location  of  workplace 
(in  the  absence  of  reporting  errors)  and  indirect  inference  about 
the  location  of  residence.  Thus,  users  in  relatively  small  areas  can 
study  the  characteristics  and  history  of  their  labor  force  with 
adequate  confidence  in  the  estimates  provided  by  the  data.  (See 
reference  10.)  Over  time,  labor  force  movement  can  be  measured 
directly,  and  residential  migration  among  major  regions  and 
economic  areas  can  be  inferred  (see  reference  6),  although  such 
inference  becomes  more  difficult  for  metropolitan  areas  with 
commutersheds  that  cross  State  lines.  (See  reference  15.)  Such 
commuting  problems  are  presently  one  of  the  major  difficulties  in 
determining  what  migration  is  actually  measured  by  the  file. 
Commuting  problems  also  lessen  the  usefulness  of  Social  Security 
estimates  for  policy  issues,  since  administrative  units  are  usually 
defined  by  place  of  residence. 

The  distinct  advantages  of  Social  Security  data  for  studying 
migration  include  their  provision  of  annual  measures  of  gross 
migration  which  can  be  associated  with  wage  and  industry  changes 
(see  reference  13),  with  some  individual  demographic  character- 
istics, and  with  characteristics  of  place  as  well.  (See  reference  11.) 
Such  advantages  in  conjunction  with  the  continuing  need  for  better 
migration  data  for  population  estimation  and  prediction  suggest 
that  further  improvements  in  the  geographic  detail  in  the  sample 
should  be  a  high  priority.  The  news  that  county  of  residence  will 
be  introduced  on  the  1972  CWHS  is  accordingly  most  welcome. 
Because  of  the  potential  usefulness  of  this  residential  information 
both  for  identifying  instances  of  "spurious"  work  force  migration 
in  the  file  and  for  directly  relating  work  force  and  residential 
migration,  researchers  should  move  quickly  to  evaluate  its  quality 
and  its  implications.  I  hope  that  place  of  residence  will  become  a 
regular  feature  of  the  file.  In  addition,  I  would  urge  strengthening 
of  SSA  efforts  to  detect  and  correct  instances  of  erroneous  report- 
ing of  county  of  work.  Lags  in  reporting  change  in  employment 
location  and  inaccurate  reporting  of  separate  establishments  from  a 
central  office  will  be  increasingly  unacceptable  as  use  of  the  data 
increases. 

Such  improvements  in  geographic  detail  will  substantially 
improve  opportunities  for  use  of  the  file  in  several  research  areas. 
Most  exciting  will  be  the  implications  of  good  information  on  place 
of  residence  for  studying  work-residence  links.  The  only  presently 
available  data  on  this  topic  are  local  traffic  surveys  and  the  1960 
and  1970  censuses  which  are  flawed  by  noncodable  or  nonexistent 
responses.  An  annual  intercensal  source  of  data  on  both  work  and 
residence  location  should  provide  many  good  opportunities  for 
studying  much  more  than  commuting;  for  instance,  changes  in 
urban  structure,  processes  of  metropolitan  economic  development, 
and  vital  linkages  between  different  parts  of  regions— rural, 
suburban,  and  urban. 

Good  place-of-residence  data  would  also  radically  improve 
Social  Security  measures  of  migration  by  removing  a  major  source 
of  uncertainty  in  the  data.  Not  only  would  residential  migration  of 
the  employed  labor  force  be  directly  measureable,  but  the  relation- 
ship between  labor  force  and  total  population  migration  would 
probably  be  easier  to  determine.  Indeed,  Social  Security  data  may 
even  become  a  preferred  source  of  studies  of  migration,  because  of 
census  problems  with  undercoveragc  of  blacks,  nonresponse  to 
migration  questions,  and  recall  haziness  about  previous  place  of 
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residence.  (See  reference  3.)  My  present  research,  which  is  trying  to 
study  equivalent  populations  by  concentrating  on  the  migration  of 
the  labor  force  working  in  particular  labor  markets,  shows  Social 
Security  rates  of  in-migration  for  males  to  be  substantially  higher 
than  those  indicated  by  the  Census  Bureau  for  equivalent  groups, 
although  similar  in  terms  of  age,  wage  level,  and  place  of  origin. 
These  findings  suggest  that  Social  Security  data  can  be  used  inter- 
censally  to  indicate  the  extent  and  direction  of  migration  that  a 
full  census  would  have  shown,  with  as  reliable  demographic  detail, 
and  in  particular  more  information  on  the  migration  of  blacks.  (See 
reference  9.) 

A  final  "far-out"  suggestion:  Zitter  and  colleagues  (references 
15  and  16)  have  shown  that  black  South-to-North  migration 
appears  to  be  poorly  represented  by  the  Social  Security  file.  This 
may  be  due  to  differential  coverage  by  industry,  in  particular 
undercoverage  of  agriculture,  so  that  rural  South  to  urban  North 
migrants  may  well  presently  appear  first  as  "new  entrants"  in  the 
North.  Information  on  lifetime  migration,  in  particular  on  State  of 
birth,  could  reduce  such  problems.  State-of-birth  information  is 
given  at  the  time  of  application  for  a  Social  Security  number  and  is 
thus  available  on  the  same  form  from  which  age,  sex,  and  race  are 
presently  coded.  Although  the  problems  for  SSA  of  retroactively 
coding  State  of  birth  may  be  insurmountable  for  those  presently  in 
the  file,  perhaps  the  SSA  could  consider  routinely  coding  this  item 
for  new  accounts.  Whatever  information  is  available  on  place  of 
birth  for  foreign  immigrants  would  also  be  valuable  for  future 
studies. 

LONGITUDINAL  COVERAGE 

With  annual  information  on  industrial  and  geographic  work 
history  back  to  1957  available  in  the  LEED  file,  and  longitudinal 
earnings  and  work  experience  history  back  to  1937,  Social  Security 
data  are  uniquely  suited  for  studying  the  dynamics  of  the  labor 
force.  Although  one  would  prefer  knowledge  of  wage  rates  to 
annual  estimates  of  total  wages,  the  processes  through  which 
workers  change  earning  levels  can  be  studied  in  unparalleled  detail. 
Workers  can  be  followed  through  both  geographic  and  industrial 
movement,  with  information  available  on  multiple  jobs,  tenure, 
and  past  labor  force  attachment  and  experience.  (See  references  1, 
3,  4,  and  5.) 

Because  of  the  suitability  of  the  data  for  this  area  of  research,  I 
would  urge  that  high  priority  be  given  to  extending  the  coverage  of 
the  file  in  several  directions.  The  aim  would  be  to  reduce  move- 
ment into  and  out  of  the  sample  so  that  the  researcher  can  know 


more  specifically  what  absence  from  the  sample  implies.  Merging 
data  from  the  railroad  retirement  and  Civil  Service  insurance 
systems  with  the  file  would  be  a  major  step  forward  to  increase 
industrial  coverage,  and  it  appears  most  straightforward.  A  signifi- 
cant improvement  would  result  if  some  index  of  unemployment— 
perhaps,  for  instance,  for  those  receiving  unemployment  benefits- 
could  be  devised  so  that  unemployment  could  be  distinguished 
from  leaving  the  labor  force.  Finally,  indication  of  retirement  from 
OASDI  claims  data  on  the  LEED  file  for  those  receiving  retirement 
benefits  would  be  most  useful.  (And  place  of  retirement,  if  avail- 
able, could  give  valuable  information  on  the  movement  of  an 
increasingly  important  migratory  group.) 

Such  improvements  would  appear  to  make  the  sample  yet  more 
useful  for  studying  manpower  development  efforts  and  programs, 
movement  into  and  out  of  poverty,  and  employment  patterns  of 
groups  with  unemployment  and  underemployment  problems.  In  this 
regard,  the  possible  addition  of  occupational  data  to  the  file  should 
be  an  important  improvement  for  the  usefulness  of  the  data  for  re- 
search in  the  area  of  manpower,  since  the  lack  of  some  proxy  for 
education  or  human  capital  has  been  sorely  felt.  (See  reference  7.) 


SUMMARY 

I  am  encouraged  by  the  continuing  development  of  a  valuable 
data  resource.  The  Social  Security  sample  presents  clear  opportuni- 
ties for  productively  studying  important  current  issues  such  as 
poverty,  migration,  manpower  development,  and  regional  eco- 
nomic growth.  Information  about  place  of  residence  and  the 
extension  of  sample  size  to  10-percent  represent  substantail 
improvements,  for  which  the  Social  Security  Administration  and 
David  Hirschberg  deserve  credit  and  thanks.  I  foresee  that  remain- 
ing problems  with  coverage,  timeliness,  and  inaccurate  reporting 
will  decrease  as  use  of  the  data  increases. 
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INTRODUCTION 

Most  of  the  data  published  by  the  Bureau  of  the  Census  is 
summary  data:  counts  of  persons,  housing  units,  businesses,  etc., 
with  particular  characteristics,  in  the  Nation  or  in  a  given  geo- 
graphic area.  With  increasing  frequency,  data  users  are  asking  for 
data  in  a  form  which  offers  them  more  flexibility— data  disaggre- 
gated back  to  the  original  reporting  unit  so  that  it  may  be  re- 
summarized  or  tabulated  to  meet  the  researcher's  own  particular 
need. 

With  regard  to  such  demand,  the  Bureau  of  the  Census  has  a 
peculiar  problem.  It  is  required  by  law  to  maintain  the  confi- 
dentiality of  information  which  could  be  associated  with  a  specific 
respondent.  The  Bureau  has  for  some  time  offered  to  create  special 
summary  tabulations  customized  to  the  needs  of  a  user  at  his 
expense.  But  the  need  articulated  here  requires  more  timely  and 
less  costly  service  than  the  Bureau  can  provide  on  a  customized 
basis.  In  view  of  the  need,  the  Bureau  of  the  Census  has  worked 
out  a  way  to  make  unaggregated  data  available,  subject  to  limita- 
tions of  content  designed  to  preclude  the  identifiability  of  any 
specific  respondent  and,  of  course,  subject  to  available  funding. 

In  this  paper  we  will  use  the  term  "microdata"  to  refer  to 
unaggregated  records  for  individual  respondents  in  a  census  or 
survey,  made  available  in  a  form  usable  outside  the  Census  Bureau, 
and  in  a  form  modified  to  avoid  identification  of  the  respondents. 
The  Bureau  has  a  number  of  such  files  available,  and  several  of 
them  are  fairly  specialized  in  nature.  For  instance  the  Census 
Employment  Survey  File  deals  only  with  populations  in  certain 
specified  low-income  areas.  The  Truck  Inventory  and  Use  Survey 
also  has  a  rather  specialized  clientele. 

The  most  widely  known  and  used  of  the  Census  Bureau's  public 
use  microdata  files  are  the  Public  Use  Samples  of  basic  records 
from  the  1960  and  the  1970  censuses.  These  files  provide  the 
greatest  range  of  demographic,  socioeconomic,  and  housing  vari- 
ables; the  entire  country  is  covered  and  no  segment  of  the  popula- 
tion is  left  out.  The  large  size  of  these  files  makes  them  usable  even 
for  relatively  small  areas  (States,  large  SMSA's).  The  major  limita- 
tion for  most  research  is  the  restriction  of  the  time  series  to  two 
points,  April  1960  and  April  1970,  and  with  7  or  8  years  to  wait 
until  the  third  point  is  available. 

That  is  where  the  Annual  Demographic  File  comes  in.  The 
Annual  Demographic  File  (ADF)  is  the  name  applied  to  the  public 
use  microdata  version  of  the  March  Current  Population  Survey 
(CPS).  As  in  the  Census  Public  Use  Samples  all  segments  of  the 
U.S.  population  are  covered,  and  again  there  is  a  considerable  range 
of  demographic  and  socioeconomic  variables.  Unlike  the  Census 
Public  Use  Samples,  the  time  series  becomes  annual,  with  six  data 
points  already  available,  from  1968  to  1973.  On  the  other  hand  the 
small  sample  size  (relative  to  the  census)  of  1  -in-1 ,400  (approxi- 
mately 50,000  households)  severely  restricts  the  potential  for 
subnational  analysis. 


THE  CURRENT  POPULATION  SURVEY 

Anyone  who  pays  attention  to  national  news  is  aware  of  month- 
ly statistics  on  the  rate  of  unemployment  and  changes  in  it.  Provid- 
ing data  for  this  purpose  is  the  primary  task  of  the  monthly 
Current  Population  Survey.  The  Census  Bureau  collects  the  data; 
the  Bureau  of  Labor  Statistics  has  principal  responsibility  for 
analysis  and  release  of  that  data.  Obviously  the  preponderance  of 
information  collected  in  the  CPS  is  related  to  labor  force  analysis 
and  demographic  material  related  to  that  primary  purpose. 

CPS  also  serves  as  a  vehicle  for  supplemental  inquiries,  peri- 
odically added  to  the  questionnaire.  In  recent  years,  supplemental 
inquiries  have  included  questions  on  immunization  against  selected 
diseases,  school  enrollment,  recent  college  graduates,  voting  in 
national  elections,  participation  in  adult  education,  marital  history, 
fertility,  work  experience,  and  income.  Supplements  to  the  CPS 
may  be  recurring  or  they  may  be  one-time  supplements  conducted 
for  a  special  purpose.  Examples  of  recurring  supplements  include 
the  work  experience  and  income  supplement  collected  in  March, 
the  multiple  job-holding,  premium-pay  supplement  collected  in 
May,  the  immunization  survey  in  September,  the  school  enrollment 
supplement  collected  in  October,  and  the  survey  of  hired  farm 
wage  workers  in  December.  Supplementary  data  collected  through 
the  survey  are  analyzed  and  published  by  the  various  sponsors.  In 
addition,  the  Census  Bureau  publishes  the  Current  Population 
Reports  based  on  the  CPS.  These  include  the  P-20  Series  Popula- 
tion Characteristics,  P-23  Series  Special  Studies,  and  P-60  Series 
Consumer  Income. 


THE  ANNUAL  DEMOGRAPHIC  FILE 


The  supplement  of  greatest  general  interest  is  the  March  supple- 
ment, with  detailed  income  and  work  history  information  as  the 
most  significant  additions.  While  it  is  used  in  generating  a  large 
number  of  the  Current  Population  Reports  of  summary  data,  our 
primary  interest  here  is  that  it  is  also  the  sample  disseminated  in 
microdata  form  as  the  Annual  Demographic  File. 


Subject  Content 


The  ADF  contains  most  of  the  demographic  and  socioeconomic 
subject  matter  generally  associated  with  census  population  data: 
Age,  sex,  race,  ethnic  origin,  family  characteristics,  educational 
attainment,  mobility,  employment,  and  income,  to  hit  the  high- 
lights. Included  also  are  work  history  items  dealing  with  the 
periods  of  unemployment,  part-time  work,  job  seeking,  and 
absence  from  the  labor  force  during  the  previous  year  and  the 
reasons  involved;  also  labor  union  membership  (1971  only),  and  a 
more  detailed  treatment  of  income  from  miscellaneous  sources 
than  is  available  from  the  1970  census.  On  the  other  hand,  a  re- 
searcher familiar  with  the  Census  Public  Use  Samples  would  find 
absent  any  data  on  housing,  farm  or  other  rural  residence,  nativity 
and  place  of  birth  or  country  of  origin,  children  ever  born,  activity 
5  years  ago,  current  school  enrollment,  place  of  work,  and  voca- 
tional training. 

Subject  content  is  relatively  constant  from  year  to  year.  There 
will  inevitably  be  some  changes  as  new  questions  are  introduced 
and  others  dropped.  A  list  of  titles  for  the  various  subject  items  in 
the  ADF  is  presented  as  appendix  A  to  this  paper. 
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Geographic  Content 

There  are  two  kinds  of  limitations  to  geographic  content.  As 
indicated  earlier,  the  usefulness  of  the  Annual  Demographic  File 
for  subnational  data  such  as  for  States  or  SMSA's  is  severely 
limited  by  the  1-in-l  ,400  sample  size.  There  is  a  second  limitation 
introduced  by  the  Bureau's  confidentiality  requirements,  and  it  is 
primarily  the  confidentiality  restrictions  reflected  in  the  geographic 
areas  identified  in  the  ADF.  The  nature  of  the  work  done  by  each 
individual  investigator  will  determine  to  what  extent  his  require- 
ment for  precision  allows  using  some  of  the  smaller  areas  identified 
on  the  file. 

Type  of  residence  is  indicated  nationwide  to  the  extent  of 
metropolitan,  central  city,  and  non metropolitan  residence  codes. 
Nine  standard  census  divisions  (groups  of  States)  are  identified. 
Most  of  the  large  States  within  those  groups  and  a  number  of  the 
largest  SMSA's  have  also  been  identified  on  the  files,  but  the  list  is 
slightly  different  for  two  time  periods:  before  and  after  the 
1972-to-1973  redesign  of  the  CPS  based  on  the  1970  census.  That 
redesign  affected  the  size  of  the  sampled  populations  in  certain 
geographic  categories,  which  in  turn  affected  the  application  of 
disclosure  rules.  Starting  with  the  1973  file,  and  for  future  files, 
SMSA's  of  1  million  or  more  population  are  identified,  34  in  all. 
Only  19  of  these  appear  on  files  for  1968  to  1972.  Comparability 
of  1972  data  to  1973  data  for  those  19  SMSA's  is  inhibited  by  the 
fact  that  prior  to  1973  SMSA's  are  defined  in  terms  of  their 
boundaries  as  of  the  1960  census.  Nine  of  the  19  SMSA's  expanded 
their  boundaries  between  the  1960  and  the  1970  censuses,  and  it  is 
the  1970  boundaries  that  will  be  reflected  on  1973  and  future  files. 
Twelve  large  States  are  being  identified  for  1973  and  hereafter. 
Prior  to  1973,  18  States  and  11  additional  groupings  were  identi- 
fied. Specific  States  and  SMSA's  identified  are  listed  in  appendix  B. 

The  sample  design  and  methods  of  weighting  CPS  data  make  it 
nicely  conducive  to  producing  estimates  for  the  Nation,  for  the 
four  major  regions,  and  for  SMSA's,  though  of  course  the  smaller 
the  area  the  more  sampling  variability  will  be  a  problem.  CPS 
sample  design  is  less  conducive  to  estimating  data  for  identified 
States  and  groups  of  States  because  the  strata  from  which  primary 
sampling  units  are  drawn  cross  State  lines.  Documentation  of  the 
Annual  Demographic  File  includes  an  appendix  discussing  prob- 
lems of  using  ADF  data  for  SMSA's  and  States. 

Unique  Advantages 

Aside  from  the  obvious  improvement  in  time  series,  the  Annual 
Demographic  File  has  two  distinct  advantages  over  Census  Public 
Use  Samples:  reduction  of  nonsampling  errors  and  matchability  of 
part  of  the  sample  from  year  to  year. 

Enumerators  used  for  the  Current  Population  Survey  are  pro- 
fessionals, benefiting  from  much  greater  training  than  can  be 
accorded  one-time  census  enumerators.  The  whole  operation  is  less 
massive  and  is  under  better  control  than  a  decennial  census.  Alloca- 
tions for  missing  or  inconsistent  data  are  required  much  less  fre- 
qently  in  the  CPS  than  in  the  decennial  operation.  Thus,  the  major 
sources  of  nonsampling  error  are  substantially  reduced,  and  in  that 
sense  certain  detailed  characteristic  data  from  the  CPS  may  be 
slightly  more  accurate  that  their  counterparts  in  the  census.  On  the 
other  hand  the  potential  for  enumerator  bias  may  be  greater,  and 
of  course  sampling  variability  is  much  larger. 

The  second  unique  advantage  is  year-to-year  linkage.  Each 
sampled  household  is  followed  over  a  16-month  period  in  CPS 
(interviews  in  4  conscctuive   months,  then  8  months  out  of  the 


sample,  then  interviews  in  4  more  months  1  year  after  the  first  4 
months).  For  each  matchable  household,  then,  a  longitudinal  obser- 
vation of  two  points  a  year  apart  is  possible.  While  theoretically 
one  half  of  each  year's  sample  would  match  with  units  in  the 
previous  year's  sample  and  the  other  half  would  link  up  with  units 
in  the  next  year's  sample,  the  match  rate  is  usually  about  two 
thirds  of  that.  Most  of  the  dropouts  are  people  who  move  during 
the  year;  others  result  from  noninterviews  or  errors  in  the  response, 
enumeration,  or  processing  phases.  Matchability  across  the  redesign 
period  of  1972  to  1973  is  also  not  possible.  Methodological  poten- 
tials and  problems  in  studying  year-to-year  change  and  extra- 
polating conclusions  to  the  population  as  a  whole  are,  however, 
beyond  the  scope  of  this  paper.1 

Technical  Characteristics 

Two  types  of  records  are  present  in  the  Annual  Demographic 
File:  family  records  and  person  records  which  follow  the  corre- 
sponding family  records.  Unrelated  individuals  each  have  their  own 
"family"  record  and  person  record,  though  the  sequence  is  such 
that  an  unrelated  individual  in  a  household  could  be  associated 
with  other  members  of  that  household. 

Each  family  record  and  each  person  record  carries  a  variable 
weight  necessary  for  producing  estimates,  unlike  the  uniformly 
weighted  Census  Public  Use  Samples. 

Public  use  computer  tapes  containing  the  Annual  Demographic 
Files  for  the  years  1968  through  1973  are  now  available  on  four 
reels  for  each  year,  and  at  the  current  price  of  $80  per  reel  that 
comes  to  $320  per  file.  The  tapes  are  IBM-compatible  labeled 
tapes,  7-  or  9-track,  in  BCD,  EBCDIC,  or  ASCII  recording  modes. 

Production  of  future  files  will  be  put  on  a  regular  basis,  similar 
in  format  to  existing  files,  releasable  6  to  8  months  after  the  survey 
date,  or  roughly  toward  the  end  of  the  same  calendar  year.  Data 
for  years  prior  to  1968  have  not  been  released  in  a  standardized 
form  due  to  funding  and  certain  technical  problems. 

OTHER  CPS  PUBLIC  USE  FILES 

Standardized  format  for  the  other  CPS  supplements  are  less 
certain.  Most  of  the  supplemental  inquiries  are  modified  from  time 
to  time,  which  makes  the  investment  in  programming  for  a  stand- 
ard format  subject  to  considerable  risk.  However,  from  time  to 
time,  special  CPS  files  are  made  available  in  public-use  format. 

1972  CPS  Voting  Supplement 

A  public-use  version  of  the  November  1972  CPS  Voting  Supple- 
ment is  one  such  file.  This  file  contains  records  for  persons  18 
years  old  and  over,  including  information  about  voting  status  in 
1972  as  well  as  social  and  economic  characteristics.  Geographic 
codes  on  the  records  identify  States.  The  file  is  contained  on  a 
single  reel  of  tape. 


1  Longitudinal  matching  of  CPS  records  is  discussed  in  two  papers:  U.S. 
Bureau  of  the  Census,  Technical  Paper  No.  31  Consistency  of  Reporting  of 
Ethnic  Origin  in  the  Current  Population  Survey,  available  from  Superin- 
tendent of  Documents,  U.S.  Government  Printing  Office,  Washington,  D.C. 
20402,  for  90  cents;  and  The  Creation  of  Longitudinal  Data  from 
Cross- Sectional  Surveys:  An  Illustration  from  the  Current  Population  Survey 
by  Terence  F.  Kelly,  available  from  the  author  at  the  Urban  Institute,  2100 
M  Street,  N.W.,  Washington,  D.C.  20037. 
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User-Sponsored  Custom  Files 

Tape  files  from  other  CPS  supplements  can  be  produced  to  meet 
the  specifications  of  an  outside  sponsoring  group  as  long  as  the 
provisions  of  Title  13  (U.S.  Code)  regarding  confidentiality  of  the 
individual's  report  are  not  violated.  This  may  involve  the  deletion 
of  certain  highly  identifiable  characteristics  and  will  always  involve 
the  identification  of  only  broad  areas  of  residence.  Specifically,  no 


area  is  identifiable  unless  it  has  a  population  of  at  least  250,000 
included  within  areas  subject  to  sampling  beyond  the  first  stage. 
The  cost  for  files  which  are  "tailor-made"  to  user  specifications  has 
averaged  in  the  past  about  $5,000. 

Further  information  on  the  Annual  Demographic  File  or  other 
special  files  from  the  CPS  can  be  obtained  from  the  Users  Service 
Staff,  Data  User  Services  Division,  Bureau  of  the  Census,  Washing- 
ton, D.C.  20233. 
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Appendix  A.  Items  on  the  Annual  Demographic  File 


Family  characteristics  records 


Character 
location 

Item 

Character 
location 

Item 

Fl 

Record  type 

F61-62 

Number  of  own  children  under  25 

F2-26 

Serial  number  and  components  of 

F63-64 

Number  of  own  children  under  18 

variance  estimator 

F65 

Number  of  own  children  under  3 

F27-28 

Type  of  family 

F66 

Number  of  own  children  under  1 

F29-30 

Age  of  head 

F67 

Presence  of  own  children  under  18, 

F31 

¥ 

of  specified  ages 

F32 

Sex  of  head 

F68 

¥ 

F33 

Race  of  head 

F69-70 

Number  of  related  children  under  18 

F34-35 

Number  of  person  records  for  family 

F71-72 

b 

F36 

# 

F73-78 

Family  income 

F37-38 

State  or  State  group  code 

F79-84 

Family  earnings 

F39-40 

SMSA  code 

F85-90 

Family  income  other  than  earnings 

F41 

b 

F91-96 

Household  income 

F42 

Met  ropol i  t  an/non-met  ropol i  t  an  re  si - 

F97-102 

Income  of  husband  and  wife 

dence 

F103-108 

Poverty  cutoff 

F43 

Household  ID 

F109 

Poverty  status 

F44 

¥ 

F110 

b 

F45-46 

Type  of  living  quarters 

Flll-112 

Source  of  family  income 

F47-48 

Number  of  family  records  for  this 

F113 

Members  in  labor  force 

household 

F114 

Number  of  earners 

F49-50 

Number  of  person  records  for  this 

F115 

Labor  force  status  of  wife 

household 

F116 

Employment  status  of  head 

F51 

Number  of  nonrelatives  of  head 

F117-130 

Income  allocation  flags 

F52 

Type  of  living  quarters 

F131-144 

¥ 

F53-54 

Size  of  family 

F145 

Spanish  language  (1971,  1972  only) 

F55-56 

Number  of  family  members  under  18 

F146 

Spanish  origin  (1971-1973) 

F57 

Number  of  family  members  18-64 

F147-204 

¥ 

F58 

Number  of  family  members  65  and 

F205-216 

Family  weight 

over 

F217-360 

b 

F59-60 

Number  of  own  children 

Person  characteristics  records 


PI 

Record  type 

P47-48 

b 

P2-26 

Serial  numbers 

P49-50 

Decade  and  year  of  first  marriage 

P27-28 

Type  of  family 

(1968-1971) 

P29-30 

Age 

P51-52 

Years  since  first  marriage 

P31 

Quarter  of  birth 

(1968-1971) 

P32 

Sex 

P53-54 

Age  at  first  marriage  (1968-1971) 

P33 

Race  (white,  Negro,  other) 

P55 

Veteran  status 

P34-35 

Serial  number  within  family 

P56 

b 

P36-37 

Mobility  status  (residence  1  year 

P57-58 

Highest  grade  of  school  attended 

ago)  (1968-1971) 

P59 

Grade  completed 

P38 

Relationship  to  head 

P60 

b 

P39 

Family  membership  key 

P61-66 

Total  income 

P40 

Subfamily  membership 

P67-72 

Wage  and  salary  income 

P41-42 

Detailed  relationship  summary 

P73-78 

Nonfarm  self -employment  income 

P43-44 

Family  relationship  summary 

P79-84 

Farm  self-employment  income 

P45 

Marital  status 

P85-90 

Social  security  or  railroad  retire- 

P46 

Quarter  of  first  marriage 

ment  income 

(1968-1971) 

P91-96 

Dividend  income 
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Appendix  A.  Items  on  the  Annual  Demographic  File-Continued 


Person  characteristics  records — Continued 


Character 
location 


P97-102 
P103-108 

P109-114 
P115-128 

P129-130 
P131 
P132 
P133-135 

P136-138 

P139 

P140 

P141-142 

P143-144 

P145 

P146 
P147 

P148 

P149 

P150 

P151 

P152 

P153-154 

P155 

P156 


Item 


Welfare  or  public  assistance  income 

Unemployment  compensation,  pension 

income 

Other  income  (e.g.,  alimony,  etc.) 

Income  allocations  flags  (not  for 

1968) 

Source  of  income 

Weeks  worked  last  year 

Full-time  part-time 

Industry  of  longest  job  held  last 

year 

Occupation  of  longest  job  held 

last  year 

Class  of  worker  last  year 

b 

Summary  industry  (last  year) 

Summary  occupation  (last  year) 

Weeks  looking  for  work  or  on  layoff 

last  year 

Reason  for  part-year  work 

Weeks  in  labor  force  (part-year 

workers) 

Stretches  of  unemployment 

Reason  for  not  working 

Weeks  in  labor  force  (nonworkers) 

Major  activity  last  week 

Employment  status 

Hours  worked  last  week 

Work  usually  35  hours  or  more 

b 


Character 
location 


P157-158 
P159 
P160 
P161 


P162 

P163- 

P165 

P166- 

P168 

P169- 

P172- 

P175 

P176 

P177- 

P179- 

P181- 

P183 

P184 

P185- 

P187- 

P189- 

P196 


164 

167 

171 
174 


178 
180 
182 


186 
188 
195 


P197-198 

P199 

P200-204 

P205-216 

P217-360 


Item 


Reason  less  than  35  hours  a  week 
Reason  absent  from  work 
Wages  for  any  of  the  time  off 
Work  usually  35  hours  or  more  a 
week  (persons  not  at  work) 
Reason  start  looking  for  work 
Weeks  unemployed 

Looking  for  full-  or  part-time  work 
Reason  could  not  take  job  last  week 
When  last  worked  at  full-time  job 
Current  industry  (3-digit  level) 
Current  occupation  (3-digit  level) 
Class  of  worker 

b 

Summary  industry  (48  groups) 

Summary  occupation  (37  groups) 

Hours  worked 

When  last  worked  at  regular  job 

V 

Major  occupation  (13  categories) 

Major  industry  (21  categories) 

Methods  used  to  find  work 

Spanish  spoken  in  household  (1971 

only) 

Ethnic  origin 

Labor  union  membership  (1971  only) 

b 

Persons  supplement  weight 


Note:   b  means  blank  or  not  used. 
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Appendix  B.  Geographic  Areas  Identified  on  the  Annual  Demographic  File 


States  and  State  groups 


1968-1972 

1973  and  future  years 

NEW  ENGLAND 

NEW  ENGLAND 

Connecticut 

Connecticut 

Combined : 

Maine 

Massachusetts 

New  Hampshire 

Combined : 

Maine 

Vermont 

New  Hampshire 

Rhode  Island 

Vermont 

Massachusetts 

Rhode  Island 

MIDDLE  ATLANTIC 

MIDDLE  ATLANTIC 

New  York 

New  York 

New  Jerse 

y 

New  Jersey 

Pennsyl vani  a 

Pennsylvania 

EAST  NORTH  CENTRAL 

EAST  NORTH  CENTRAL 

Ohio 

Ohio 

Indiana 

Indiana 

Illinois 

Illinois 

Combined : 

Michigan 

Combined : 

Michigan 

Wisconsin 

Wisconsin 

WEST  NORTH  CENTRAL 

WEST  NORTH  CENTRAL 

Missouri 

Combined : 

Minnesota 

Combined : 

Minnesota 

Iowa 

Iowa 

Missouri 

Combined : 

North  Dakota 

North  Dakota 

South  Dakota 

South  Dakota 

Nebraska 

Nebraska 

Kansas 

Kansas 

SOUTH  ATLANTIC 

SOUTH  ATLANTIC 

District 

of  Columbia 

District 

of  Columbia 

Maryland 

North  Carolina 

West  Virginia 

Florida 

Georgia 

Combined : 

Delaware 

Florida 

Maryland 

Combined : 

Delaware 

Virginia 

Virginia 

West  Virginia 

Combined : 

North  Carolina 

Combined : 

South  Carolina 

South  Carolina 

Georgia 

EAST  SOUTH  CENTRAL 

EAST  SOUTH  CENTRAL 

Kentucky 

Combined : 

Kentucky 

T'-nnessee 

Tennessee 

Combined : 

Alabama 

Combined : 

Alabama 

Mississippi 

Mississippi 
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Appendix  B.  Geographic  Areas  Identified  on  the  Annual  Demographic  File-Continued 


States  and  State  groups--Continued 


1968-1972 

1973  and  future  years 

WEST  SOUTH  CENTRAL 

WEST  SOUTH  CENTRAL 

Louisiana 

Texas 

Texas 

Combined : 

Arkansas 

Combined:  Arkansas 

Louisiana 

Oklahoma 

Oklahoma 

MOUNTAIN 

MOUNTAIN 

Combined:  Colorado 

Combined : 

Montana 

New  Mexico 

Idaho 

Arizona 

Wyoming 

Combined:  Idaho 

Colorado 

Montana 

New  Mexico 

Wyoming 

Arizona 

Utah 

Utah 

Nevada 

Nevada 

PACIFIC 

PACIFIC 

Oregon 

California 

California 

Combined : 

Washington 

Combined:  Alaska 

Oregon 

Hawaii 

Alaska 

Washington 

Hawaii 

Standard  metropolitan  statistical  areas 


New  York1 

New  York 

Los  Angeles-Long  Beach2 

Los  Angeles-Long  Beach 

Chicago 

Chicago 

Philadelphia 

Phildelphia 

Detroit 

Detroit 

San  Francisco-Oakland* 

San  Francisco-Oakland 

Boston* 

Washington,  D.C. 

Pittsburgh 

Boston 

St.  Louis* 

Nassau-Suffolk 

Washington,  D.C.* 

Pittsburgh 

Cleveland* 

St.  Louis 

Baltimore* 

Baltimore 

Newark 

Cleveland 

Minneapolis-St .  Paul 

Houston 

Buffalo 

Newark 

Houston* 

Minneapolis-St.  Paul 

Milwaukee* 

Dallas 

Patterson-Clifton-Passaic 

Seattle-Everett 

Dallas* 

Anaheim-Santa  Ana-Garden  Grove 

See  footnotes  at  end  of  appendix. 
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Appendix  B.  Geographic  Areas  Identified  on  the  Annual  Demographic  File-Continued 


Standard  metropolitan  statistical  areas — Continued 


1968-1972 

1973  and  future  years 

Milwaukee 

Atlanta 

Cincinnati3 

Patterson-Clifton-Passaic 

San  Diego 

Buffalo 

Miami 

Kansas  City 

Denver 

Riverside-San  Bernardino-Ontario 

Indianapolis4 

San  Jose 

New  Orleans 

Tampa-St.  Petersburg 

Portland 

Phoenix 

*SMSA's  are  defined  according  to  1960  boundaries  in  the  1968-1972  ADF's.   These  SMSA' s 
changed  boundaries  for  the  1970  census. 


Corresponds  to  New  York  SMSA  and  Nassau-Suffolk  SMSA  in  1973  and  future  files. 
Corresponds  to  Los  Angeles-Long  Beach  SMSA  and  Anaheim-Santa  Ana-Garden  Grove  SMSA  in  1973 
and  future  files. 

3The  Kentucky  part  of  the  Cincinnati  SMSA  is  omitted. 

40nly  the  central  city-county  of  the  Indianapolis  SMSA  is  identified. 
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Comments  on  Using 

Current  Population  Survey  Data  Files 

Harold  Beebout 

Mathematica,  Inc.,  Washington,  D.C. 

INTRODUCTION 

Following  the  precedent  of  one  discussant  responding  to  one 
paper,  I  would  like  to  discuss  the  problems  and  opportunities  of 
Current  Population  Survey  (CPS)  data  files  from  a  particular  user 
perspective,  that  of  using  microsimulation  for  public  policy 
analysis.  Our  work  has  been  primarily  concerned  with  analysis  of 
the  eligible  population  for  current  income  transfer  programs  such 
as  Aid  to  Families  with  Dependent  Children,  Supplemental  Secur- 
ity Income,  and  food  stamps,  and  for  proposed  income  mainte- 
nance programs  ranging  from  negative  income  taxes  to  guaranteed 
employment.  The  simulated  eligible  populations  are  used  in  pre- 
paring estimates  of  caseloads,  budget  costs,  impact  on  the  after- 
tax-after-transfer distribution  of  income  and  the  antipoverty 
effectiveness  of  the  various  current  or  proposed  transfer  programs. 

The  March  Current  Population  Survey  data  files  discussed  by 
Paul  Zeisset  are  the  data  files  we  use  most  frequently  as  the  base 
for  these  analyses,  although  we  also  make  use  of  the  Public  Use 
Samples  from  the  1970  census  and  the  1967  Survey  of  Economic 
Opportunity.  The  CPS  has  proven  to  be  a  most  useful  data  file  for 
our  simulation  analyses.  It  is  a  very  rich  source  of  demographic, 
labor  force,  and  income  data.  The  data  files  are  generally  well 
edited  and  easily  used  once  the  user  invests  the  time  necessary  to 
learn  the  structure  of  the  files  and  the  concepts  behind  the  data. 
Most  of  the  important  information  needed  to  accurately  reproduce 
the  distribution  of  tax  and  transfer  programs  is  available  from  the 
CPS.  After  imputing  information  on  a  few  missing  variables,  our 
simulated  distributions  of  FICA  payroll  taxes  and  Federal  income 
taxes  compare  favorably  with  relevant  administrative  program  data, 
and  even  the  incomprehensible  public  assistance  is  consistent  with 
program  data. 

SUGGESTIONS  FOR  IMPROVING  THE  CPS  DATA  FILE 

There  are,  however,  a  few  missing  pieces  of  information  that 
would  greatly  improve  the  ability  of  researchers  to  use  the  CPS  for 
transfer  policy  analysis.  These  missing  pieces  are  in  three  areas: 
Income  and  assets,  disability,  and  family  structure.  In  the  income 
and  assets  area  we  have  the  following  four  suggestions: 


1 .  Ask  and  record  each  of  the  detailed  types  of  unearn- 
ed income  separately  so  that  every  user  does  not  have  to  do  his 
own  allocation.  (We  understand  changes  of  this  type  are  under 
consideration.) 

2.  Include  questions  on  in-kind  transfer  income  to  the  ex- 
tent feasible.  (At  least  include  food  stamps,  for  which  the  June 
CPS  data  appear  reasonably  good.) 

3.  Include  a  wage  rate  question  such  as  we  understand  was 
asked  on  the  May  CPS.  This  is  needed  for  research  on  two  areas 
of  crucial  interest  to  social  policy:  (1)  estimating  participation 
in  and  cost  of  wage  subsidies,  as  well  as  public  employment 
programs,  and  (2)  estimating  the  labor  supply  response  to  trans- 
fer programs.  Researchers  using  CPS  data  have  resorted  to  using 
an  unsatisfactory  proxy  for  the  wage  rate  derived  from  wage 
and  salary  income,  weeks  worked  last  year  (a  class  variable),  and 
hours  worked  last  week. 

4.  Include  a  measure  of  family  assets  even  if  the  result  is 
rather  crude.  This  is  needed  to  accurately  determine  eligibility 
for  most  income  tested  grants  and  is  particularly  important  for 
determining  SSI  eligibility  as  well  as  for  determing  poverty  and 
economic  well-being. 

In  the  disability  area  we  suggest  the  two  1970  census  questions 
on  the  degree  and  length  of  health  limitations  be  added  to  the  CPS. 

We  recognize  the  problems  of  long  interviews  but  suggest  that,  if 
length  is  an  insurmountable  problem,  some  of  the  information  be 
asked  in  the  succeeding  month  and  matched  in  by  the  Census 
Bureau.  Of  course  the  portion  of  the  sample  rotated  out  could  not 
be  matched. 

In  the  family  structure  area  we  suggest  that  the  Census  Bureau 
move  away  from  the  nuclear  family  concept  which  is  an  increas- 
ingly unrealistic  description  of  both  the  way  many  families  live  and 
the  way  they  are  treated  by  social  programs.  More  specifically,  the 
head  should  be  defined  as  the  person  contributing  the  greater  por- 
tion of  the  unit's  income.  Children's  relationships  to  the  head  and 
spouse  should  be  defined  as  own  or  adopted  or  stepchild.  A 
specific  example  of  this  need  is  our  inability  to  identify  male- 
headed  stepchild  AFDC  eligibles  which  are  an  increasingly 
important  segment  of  the  caseload. 


CONCLUSION 

The  CPS  is  already  a  vitally  important  source  of  data  for  the 
microsimulation  and  associated  policy  analysis  community.  We  are 
pleased  that  the  Bureau  of  the  Census  is  making  these  files  more 
generally  available  through  the  public  use  file  versions.  We  compli- 
ment the  Bureau  on  the  quality  and  scope  of  the  current  CPS  and 
hope  that  these  comments  and  suggestions  are  helpful  in  develop- 
ing an  even  more  useful  data  file. 
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